When Should You Adjust Standard Errors for Clustering?

00footnotetext: ​The questions addressed in this article partly originated in discussions with Gary Chamberlain. We are grateful for questions raised by Chris Blattman and seminar audiences, and for insightful comments by Colin Cameron, Vicente Guerra, four reviewers, Larry Katz, and Jesse Shapiro. Jaume Vives-i-Bastida provided expert research assistance. This work was supported by the Office of Naval Research under grants N00014-17-1-2131 and N00014-19-1-2468.
Alberto Abadie Susan Athey
MIT Stanford
Guido W. Imbens Jeffrey M. Wooldridge
Stanford MSU

March 14, 2024


Abstract

Clustered standard errors, with clusters defined by factors such as geography, are widespread in empirical research in economics and many other disciplines. Formally, clustered standard errors adjust for the correlations induced by sampling the outcome variable from a data-generating process with unobserved cluster-level components. However, the standard econometric framework for clustering leaves important questions unanswered: (i) Why do we adjust standard errors for clustering in some ways but not others, e.g., by state but not by gender, and in observational studies, but not in completely randomized experiments? (ii) Why is conventional clustering an “all-or-nothing” adjustment, while within-cluster correlations can be strong or extremely weak? (iii) In what settings does the choice of whether and how to cluster make a difference? We address these and other questions using a novel framework for clustered inference on average treatment effects. In addition to the common sampling component, the new framework incorporates a design component that accounts for the variability induced on the estimator by the treatment assignment mechanism. We show that, when the number of clusters in the sample is a non-negligible fraction of the number of clusters in the population, conventional cluster standard errors can be severely inflated, and propose new variance estimators that correct for this bias.

1 Introduction

Imagine you estimated the effect of attending college on labor earnings using linear regression on a cross-section of U.S. workers. How should you calculate the standard error? Empirical studies in economics often report heteroskedasticity-robust standard errors (henceforth “robust”) associated with the work by eicker1963, huber1967behavior, and white1980heteroskedasticity. A common alternative is to report cluster-robust standard errors (henceforth “cluster”) associated with the work by liang1986longitudinal and arellano1987practitioners, with clustering often applied within geographic units such as states or counties. moulton1986random, moulton1987diagnostics and Bertrand2004did have shown that clustering adjustments can make a substantial difference, and since the 1980s cluster standard errors have become commonplace in empirical economics.

Later in this section, we estimate a log-linear regression of earnings on an indicator for some college using data from the 2000 U.S. Census. We find that standard errors clustered at the state level are more than 20 times larger than robust standard errors. Which ones should a researcher report? The conventional framework for clustering [see cameron2015practitioner, mackinnon2021cluster, for recent reviews] suggests that if the clustering adjustment matters, in the sense that the cluster standard errors are substantially larger than the robust standard errors, one should use the cluster standard errors. In this article, we develop a new framework for cluster adjustments to standard errors that nests the conventional framework as a limiting case. The new framework suggests novel standard error formulas that can substantially improve over robust and cluster standard errors in settings like the earnings regression described above.

Our proposed clustering framework differs from the standard one in that it includes a design component that accounts for between-clusters variation in treatment assignments. We argue that the new design component is important because between-cluster variation in treatment assignments often motivates the use of clustered standard errors in empirical studies [see, e.g., gentzkow2008preschool, cohen2010free]. In addition, our framework shifts the focus of interest from features of infinite super-populations/data-generating processes to average treatment effects defined for the finite (but potentially large) population at hand. As a result of this shift, it is the sampling process and the treatment assignment mechanism that solely determine the correct level of clustering; the presence of cluster-level unobserved components of the outcome variable becomes irrelevant for the choice of clustering level. Moreover, by focusing on finite populations (which could be entirely or substantially sampled in the data) we obtain standard errors smaller than those aiming to measure uncertainty with respect to features of infinite super-populations. We derive the large sample variances for the least squares and fixed effect estimators under our proposed framework and show that they differ in general from both the robust and the cluster variances. We also propose two estimators for the large sample variances, one analytic and one based on a re-sampling (bootstrap) approach. For the U.S. earnings application, our proposals produce standard errors that are substantially larger than the robust standard errors, but also substantially smaller than the conventional version of cluster standard errors.

We use our framework to highlight three common misconceptions surrounding clustering adjustments. The first misconception is that the need for clustering hinges on the presence of a non-zero correlation between residuals for units belonging to the same cluster. We show that the presence of such correlation does not imply the need to use cluster adjustments, and that the absence of such correlation does not imply that clustering is not required. The second misconception is that there is no harm in using clustering adjustments when they are not required, with the implication that if clustering the standard errors makes a difference, one should do so. To see that both of these claims are in fact incorrect, consider the following simple example. Suppose that, based on a random sample from the population of interest, we use the sample average of a variable to estimate its population mean. Suppose also that the population can be partitioned into clusters such as geographical units. If outcomes are positively correlated within clusters, the cluster variance will be larger than the robust variance. However, standard sampling theory directly implies that if the units are sampled randomly from the population there is no need to cluster. The harm in clustering in this case is that confidence intervals will be unnecessarily conservative, possibly by a wide margin. A third misconception is that researchers have only two choices: either fully adjust for clustering and use the cluster standard errors, or not adjust the standard errors at all and use the robust standard errors. We show that a combination of the robust and the cluster variance estimators can substantially improve accuracy over its two components.

The new clustering framework in this article has the advantage of providing actionable guidance on a question of substantial consequence for empirical practice in econometrics: When should standard errors be clustered, and at what level? In the conventional model-based econometric framework, the researcher takes a stand on the error component structure of a model for the outcome variable. For example, suppose that, following moulton1986random, moulton1987diagnostics, the researcher posits a random effects model, with random effects at the state level. In this setting, a repeated sampling thought experiment entails that, for each sample, different values of the state random effects are drawn from their distributions. This model-based approach implies that if we are estimating a population mean using a sample average one needs to cluster the standard errors at the state level even if the sample is a random sample of individuals and not a clustered sample. A drawback of the model-based econometric framework for clustering is that empirical researchers need to take a stand on the structure of the error components of their models.

A second, closely related, framework for clustering that is often invoked in the econometrics literature is motivated by a sampling mechanism that in a first stage selects clusters at random from an infinite population, followed by a second stage of random sampling of units from the sampled clusters (or keeping all units in a cluster). Although this framework is appropriate for some applications in the analyses of surveys, where it originated [kish1995survey, thompson2012sampling], we argue that it is not appropriate for many of the data sets economists and other social scientists analyze. In many applications in economics, researchers do observe units from all the clusters they are interested in, e.g., all the states in the U.S., and a framework based on randomly sampling a small fraction of a large population of clusters does not apply.

Neither of the two conventional frameworks for clustered inference described above fully incorporates the design aspect of clustering. And it is the lack of a design component that makes them inappropriate for inference on treatment effects. To gain insight on the importance of the assignment mechanism for the standard errors of treatment effects estimators, consider a setting with individuals sampled at random from a population, but where treatment is assigned at the cluster level, with the same treatment value for all the individuals in the same cluster. Assume that the quantity of interest is the population average treatment effect. Clustered assignment to treatment is equivalent to clustered sampling of potential outcomes. Because the parameter of interest depends on averages of potential outcomes, which are sampled in a clustered manner, clustering of the standard errors is required in this setting, even when the individual observations are sampled at random. Our framework for clustered inference in this setting is close in spirit to the sampling framework described in the previous paragraph, but it incorporates a design component.

By shifting the attention from parameters of a data generating process for the outcomes to the average treatment effect for the population at hand, a researcher applying the proposals in this article does not need to take a stand on the error component structure of a model for the outcome variable to calculate standard errors. Instead, all the relevant variability of the estimator with respect to the average treatment effect is generated by the sampling mechanism, which extracts the sample from the population, and the assignment mechanism, which determines which units are exposed to the treatment. We see this as an intrinsic advantage of the framework proposed in this article in settings where it is difficult to justify a particular error component structure.

In this article we make three contributions. The first one is a novel framework for clustering, building on the one developed by abadie2020sampling for the analysis of regression estimators from a design perspective. We allow for clustering both in the sampling process and in the assignment process. As a result, the framework nests both the traditional case of clustered sampling and the case of clustered treatment assignment in experiments as special cases. It also allows for intermediate cases. In particular, treatment assignment may depend on cluster but not perfectly so, and there remains variation in treatments within-clusters. This framework clarifies the separate roles of clustering in the sampling process and clustering in the assignment process. It also clarifies what we can learn from the data about the need to adjust standard errors for clustering. In our framework, the data are not informative about the need to adjust for clustering in the sampling process, but they are informative about the need to adjust for clustering in the assignment process.

In our second contribution, we derive central limit theorems and large sample variances for the least squares and the fixed effect estimators of average treatment effects that take into account variation both from sampling and assignment. Comparing these variances to limit versions of the robust and cluster variances shows that the robust standard errors are generally too small, and the cluster standard errors are unnecessarily conservative. These comparisons also highlight how heterogeneity in treatment effects affects inference in the estimation of average treatment effects. Often researchers specify models that implicitly assume constant treatment effects without appreciating the implications for inference. We show, however, that heterogeneity in treatment effects introduces additional variance components that affect the need for clustering adjustments.

In our third contribution, we propose new variance formulas and bootstrap procedures for treatment effects estimators in the presence of clustering. We use the term Causal Cluster Variance (CCV) for the analytic variance formulas. For the case of a least squares estimator of average treatment effects, the intuition for the CCV variance formula is as follows. The error of the least squares estimator is approximately equal to a sum, over all units, of an expression involving products of regression errors and regressors values. The robust variance is approximately equal to a sum, over all units, of the squares of these products. In contrast, the conventional cluster variance estimator is approximately equal to a sum, over all clusters, of squares of within-cluster sums of the same products. Although the sum over all clusters of the expectation of the within-cluster sums of these products is zero, the expectation for each cluster separately is not. For each cluster in the sample, it is possible to estimate the expectation of the sum of the products between regression errors and regressors values. The CCV formula uses these estimates to correct the bias of the conventional cluster variance. The CCV correction does not help much if only a small fraction of clusters are sampled. However, when a large fraction of the clusters are represented in the sample, the CCV correction can lead to substantial improvements. This adjustment relies on estimates of cluster-level treatment effects, and thus requires within-cluster variation in treatment assignment. In addition, we propose a bootstrap version of the variance estimator., which we compare to two benchmarks. In contrast to conventional bootstrap procedures, which are based on resampling individual units or entire clusters of units, our proposed Two-Stage-Cluster-Bootstrap (TSCB) conducts resampling in two stages. In the first stage, the fraction treated for each cluster is drawn from the empirical distribution of cluster-specific treatment fractions. In the second stage, the researcher samples the treated and control units from each cluster, with their number of units determined in the first stage. The CCV and TSCB variance estimators are designed for applications with large number of observations and substantial variation in treatment assignment within clusters.

To illustrate the empirical relevance of our results, we analyze a sample from the 2000 U.S. Decennial Census, which includes 2,632,838 individuals. We define 52 clusters according to residency in the 50 states, Puerto Rico, and the District of Columbia. We consider two log-linear regressions of individual earnings on a treatment variable that encodes information on college attendance. In the first specification, the treatment variable is measured as an average, at the state level. In a second specification, we measure college attendance at the individual level.

Table 1: College effects in the Census sample
Dependent variable: Log labor earnings
Panel A
Treatment: State indicator for share of some
college greater than 0.55
OLS
coefficient 0.1022
standard error:
robust (0.0012)
cluster (0.0312)
Panel B
Treatment: Individual indicator for some college
OLS FE
coefficient 0.4656 0.4570
standard error:
robust (0.0012) (0.0012)
cluster (0.0269) (0.0276)
causal cluster variance (CCV) (0.0035) (0.0014)
two-stage cluster bootstrap (TSCB) (0.0036) (0.0014)

In Panel A of Table 1, we report results for a regression where the only explanatory variable is a binary treatment that takes value one if the fraction of individuals with at least some college residing in the state is 0.55 or higher, and value zero otherwise (we chose the 0.55 value to ensure sufficient variation in the treatment over the 52 clusters). Notice that the treatment is constant within states. We report the ordinary least squares (OLS) estimate, as well as robust and cluster standard errors. Since the late 1980s, it has been common practice to report cluster standard errors in settings where the regressors are constant within a cluster. Clustering at the state level makes a substantial difference relative to using robust standard errors, with the cluster standard errors approximately twenty-six times larger than the robust standard errors.

In Panel B of Table 1, the sole regressor is an individual-level indicator for at least some college. In addition to OLS, we report the fixed effects (FE) estimate (with fixed effects for the 50 states, plus Washington DC and Puerto Rico) and robust, cluster, CCV, and TSCB standard errors in parentheses. Like for the regression of the first panel, clustering at the state level makes a substantial difference in the standard errors, with the cluster standard errors approximately twenty-three times larger than the robust standard errors, both for the OLS and the FE regressions. In Panel B, our proposed CCV and TSCB standard errors for the OLS estimate are 0.0035 and 0.0036 respectively, in between the robust standard errors (0.0012) and the cluster standard errors (0.0269), and substantially different from both. The same holds for the FE estimator. The cluster standard error is 0.0276, quite different from the robust standard errors, 0.0012. The CCV and TSCB standard errors are 0.0014, in between robust and cluster but much closer to robust.

2 A Framework for Clustering

In this section, we describe in detail the framework for our analysis. There are multiple components to our set-up that are not explicitly modeled in the usual analysis of the variance of econometric estimators. In general, quantifying the uncertainty of parameter estimates requires describing the population and articulating the assumptions that describe how the sample was generated from that population (that is, building a model for the data generating process). In our framework, there are three distinct sources of sampling variation that lead to variation in the estimates. First, there is variation across samples in which units are observed in each cluster. Second, there is potentially variation in which clusters are observed (which leads to different units being observed). Third, there is variation in the treatment assignment across units. Whereas the standard framework for clustering focuses solely on the first two (sampling) sources of uncertainty, our proposed framework allows for all three. How much these three components matter for the variance of the least squares and fixed effects estimators of the average treatment effect depends on (i) the sampling process, (ii) the assignment process, and (iii) the heterogeneity in the treatment effects across clusters. To facilitate the calculation of asymptotic approximations in a range of relevant settings for empirical practice, it is convenient to formally consider a sequence of populations where we can separately control the fraction of units in the population that are sampled and the fraction of clusters in the population that is sampled, as well as the assignment mechanism.

2.1 A Sequence of Populations

We have a sequence of populations indexed by k𝑘k. The k𝑘k-th population has nksubscript𝑛𝑘n_{k} units, indexed by i=1,,nk𝑖1subscript𝑛𝑘i=1,\ldots,n_{k}. The population is partitioned into mksubscript𝑚𝑘m_{k} clusters. Let mk,i{1,,mk}subscript𝑚𝑘𝑖1subscript𝑚𝑘m_{k,i}\in\{1,\ldots,m_{k}\} denote the cluster that unit i𝑖i of population k𝑘k belongs to. The number of units in cluster m𝑚m of population k𝑘k is nk,m1subscript𝑛𝑘𝑚1n_{k,m}\geq 1. For each unit, i𝑖i, there are two potential outcomes, yk,i(1)subscript𝑦𝑘𝑖1y_{k,i}(1) and yk,i(0)subscript𝑦𝑘𝑖0y_{k,i}(0), corresponding to treatment and no treatment. Thus the population is characterized by the set of triples (mk,i,yk,i(0),yk,i(1))subscript𝑚𝑘𝑖subscript𝑦𝑘𝑖0subscript𝑦𝑘𝑖1(m_{k,i},y_{k,i}(0),y_{k,i}(1)), for units 1,,nk1subscript𝑛𝑘1,\ldots,n_{k} and clusters 1,,mk1subscript𝑚𝑘1,\ldots,m_{k}. The object of interest is the population average treatment effect

τk=1nki=1nk(yk,i(1)yk,i(0)).subscript𝜏𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑦𝑘𝑖1subscript𝑦𝑘𝑖0\tau_{k}=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigl{(}y_{k,i}(1)-y_{k,i}(0)\bigr{)}.

The population average treatment effect by cluster is

τk,m=1nk,mi=1nk1{mk,i=m}(yk,i(1)yk,i(0)).subscript𝜏𝑘𝑚1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑦𝑘𝑖1subscript𝑦𝑘𝑖0\tau_{k,m}=\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(y_{k,i}(1)-y_{k,i}(0)).

Therefore,

τk=m=1mknk,mnkτk,m.subscript𝜏𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘subscript𝜏𝑘𝑚\tau_{k}=\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}\tau_{k,m}.

We assume that potential outcomes, yk,i(1)subscript𝑦𝑘𝑖1y_{k,i}(1) and yk,i(0)subscript𝑦𝑘𝑖0y_{k,i}(0), are bounded in absolute value, uniformly for all (k,i)𝑘𝑖(k,i).

For each unit in the population, we define the stochastic treatment indicator, Wk,i{0,1}subscript𝑊𝑘𝑖01W_{k,i}\in\{0,1\}. The realized outcome for unit i𝑖i in population k𝑘k is Yk,i=yk,i(Wk,i)subscript𝑌𝑘𝑖subscript𝑦𝑘𝑖subscript𝑊𝑘𝑖Y_{k,i}=y_{k,i}(W_{k,i}). For a random sample of the population, we observe the triple (Yk,i,Wk,i,mk,i)subscript𝑌𝑘𝑖subscript𝑊𝑘𝑖subscript𝑚𝑘𝑖(Y_{k,i},W_{k,i},m_{k,i}). Inclusion in the sample is represented by the random variable Rk,isubscript𝑅𝑘𝑖R_{k,i}, which takes value one if unit i𝑖i belongs to the sample, and value zero if not. We next describe the two components of the stochastic nature of the sample: the sampling process that determines the values of Rk,isubscript𝑅𝑘𝑖R_{k,i}, and the assignment process that determines the values of Wk,isubscript𝑊𝑘𝑖W_{k,i}.

2.2 The Sampling Process

The sampling process that determines the values of Rk,isubscript𝑅𝑘𝑖R_{k,i} is independent of the potential outcomes and the assignments. It consists of two stages. First, clusters are sampled with cluster sampling probability qk(0,1]subscript𝑞𝑘01q_{k}\in(0,1]. Second, units are sampled from the subpopulation consisting of all the sampled clusters, with unit sampling probability equal to pk(0,1]subscript𝑝𝑘01p_{k}\in(0,1]. Both qksubscript𝑞𝑘q_{k} and pksubscript𝑝𝑘p_{k} may be equal to one, or close to zero. If qk=1subscript𝑞𝑘1q_{k}=1, we sample all clusters. If pk=1subscript𝑝𝑘1p_{k}=1, we sample all units from the sampled clusters. If qk=pk=1subscript𝑞𝑘subscript𝑝𝑘1q_{k}=p_{k}=1, all units in the population are sampled. The standard framework for analyzing clustering focuses on the special case where qk0subscript𝑞𝑘0q_{k}\rightarrow 0, so only a small fraction of the clusters in the population are sampled. The case qk=1subscript𝑞𝑘1q_{k}=1 and pk0subscript𝑝𝑘0p_{k}\rightarrow 0 corresponds to taking a relatively small random sample of units from the population. While this is an important special case, there are also many applications where the sampled clusters comprise a large fraction of the overall set of clusters. We refer to the case of qk=1subscript𝑞𝑘1q_{k}=1 as random sampling and to the case of qk<1subscript𝑞𝑘1q_{k}<1 as clustered sampling.

2.3 The Assignment Process

The assignment process that determines the values of Wk,isubscript𝑊𝑘𝑖W_{k,i} also consists of two stages. In the first stage of the assignment process, for cluster m𝑚m in population k𝑘k, an assignment probability Ak,m[0,1]subscript𝐴𝑘𝑚01A_{k,m}\in[0,1] is drawn randomly from a distribution with mean μksubscript𝜇𝑘\mu_{k}, bounded away from zero and one uniformly in k𝑘k, and variance σk2subscriptsuperscript𝜎2𝑘\sigma^{2}_{k}, independently for each cluster. The variance σk2subscriptsuperscript𝜎2𝑘\sigma^{2}_{k} is key. If σk2subscriptsuperscript𝜎2𝑘\sigma^{2}_{k} is zero, then Ak,msubscript𝐴𝑘𝑚A_{k,m} is the same for all m𝑚m, and Wk,isubscript𝑊𝑘𝑖W_{k,i} is randomly assigned across clusters. We refer to this case as random assignment. For positive values of σk2superscriptsubscript𝜎𝑘2\sigma_{k}^{2} assignment probabilities depend on cluster. Because Ak,m2Ak,msuperscriptsubscript𝐴𝑘𝑚2subscript𝐴𝑘𝑚A_{k,m}^{2}\leq A_{k,m}, it follows that σk2superscriptsubscript𝜎𝑘2\sigma_{k}^{2} is bounded above by μk(1μk)subscript𝜇𝑘1subscript𝜇𝑘\mu_{k}(1-\mu_{k}) and that the bound is attained when Ak,msubscript𝐴𝑘𝑚A_{k,m} can only take values zero or one, so all units within a cluster have the same values for the treatment. We use the term clustered assignment to refer to the case σk2=μk(1μk)superscriptsubscript𝜎𝑘2subscript𝜇𝑘1subscript𝜇𝑘\sigma_{k}^{2}=\mu_{k}(1-\mu_{k}), when there is no within-cluster variation in Wk,isubscript𝑊𝑘𝑖W_{k,i}. We use the term partially clustered assignment to refer to the case 0<σk2<μk(1μk)0superscriptsubscript𝜎𝑘2subscript𝜇𝑘1subscript𝜇𝑘0<\sigma_{k}^{2}<\mu_{k}(1-\mu_{k}), where assignment depends on cluster but not all units in the same cluster necessarily have the same value of Wk,isubscript𝑊𝑘𝑖W_{k,i}. In the second stage of the assignment process, each unit in cluster m𝑚m is assigned to the treatment independently, with cluster-specific probability Ak,msubscript𝐴𝑘𝑚A_{k,m}.

3 The Least Squares Estimator and its Variance

Let

Nk,1=i=1nkRk,iWk,i and Nk,0=i=1nkRk,i(1Wk,i)formulae-sequencesubscript𝑁𝑘1superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖 and subscript𝑁𝑘0superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖N_{k,1}=\sum_{i=1}^{n_{k}}R_{k,i}W_{k,i}\quad\mbox{ and }\quad N_{k,0}=\sum_{i=1}^{n_{k}}R_{k,i}(1-W_{k,i})

be the number of treated and untreated units in the sample, respectively; these are random variables. The total sample size is Nk=Nk,1+Nk,0subscript𝑁𝑘subscript𝑁𝑘1subscript𝑁𝑘0N_{k}=N_{k,1}+N_{k,0}.

We first analyze the OLS estimator of a regression of the outcome Yk,isubscript𝑌𝑘𝑖Y_{k,i} on an intercept and the treatment indicator Wk,isubscript𝑊𝑘𝑖W_{k,i}. The OLS estimator (modified so it is well-defined even when Nk,1=0subscript𝑁𝑘10N_{k,1}=0 or Nk,0=0subscript𝑁𝑘00N_{k,0}=0) is equal to the difference in means:

τ^k=1Nk,11i=1nkRk,iWk,iYk,i1Nk,01i=1nkRk,i(1Wk,i)Yk,i,subscript^𝜏𝑘1subscript𝑁𝑘11superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑌𝑘𝑖1subscript𝑁𝑘01superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑌𝑘𝑖\widehat{\tau}_{k}=\frac{1}{N_{k,1}\vee 1}\sum_{i=1}^{n_{k}}R_{k,i}W_{k,i}Y_{k,i}-\frac{1}{N_{k,0}\vee 1}\sum_{i=1}^{n_{k}}R_{k,i}(1-W_{k,i})Y_{k,i}, (1)

where Nk,11subscript𝑁𝑘11N_{k,1}\vee 1 and Nk,01subscript𝑁𝑘01N_{k,0}\vee 1 are the maxima of Nk,1subscript𝑁𝑘1N_{k,1} and 1 and of Nk,0subscript𝑁𝑘0N_{k,0} and 1, respectively.

We make the following assumptions about the sampling process and the cluster sizes: (i) mkqksubscript𝑚𝑘subscript𝑞𝑘m_{k}q_{k}\rightarrow\infty, (ii) lim infkpkminmnk,m>0subscriptlimit-infimum𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0\liminf_{k\rightarrow\infty}p_{k}\min_{m}n_{k,m}>0, and (iii) lim supkmaxmnk,m/minmnk,m<subscriptlimit-supremum𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚\limsup_{k\rightarrow\infty}\max_{m}n_{k,m}/\min_{m}n_{k,m}<\infty. The first assumption implies that the expected number of sampled clusters goes to infinity as k𝑘k increases. The second assumption implies that the average number of observations sampled per cluster, conditional on the cluster being sampled, does not go to zero. The third assumption restricts the imbalance between the number of units across clusters. Notice that assumptions (i) and (ii) imply nkpkqksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘n_{k}p_{k}q_{k}\rightarrow\infty, so the sample size becomes larger in expectation as k𝑘k increases.

3.1 Large k𝑘k Distribution of the Least Squares Estimator

Our first main result derives the large k𝑘k distribution of τ^ksubscript^𝜏𝑘\widehat{\tau}_{k}. Let αk=(1/nk)i=1nkyk,i(0)subscript𝛼𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑦𝑘𝑖0\alpha_{k}=(1/n_{k})\sum_{i=1}^{n_{k}}y_{k,i}(0), uk,i(1)=yk,i(1)(αk+τk)subscript𝑢𝑘𝑖1subscript𝑦𝑘𝑖1subscript𝛼𝑘subscript𝜏𝑘u_{k,i}(1)=y_{k,i}(1)-(\alpha_{k}+\tau_{k}), and uk,i(0)=yk,i(0)αksubscript𝑢𝑘𝑖0subscript𝑦𝑘𝑖0subscript𝛼𝑘u_{k,i}(0)=y_{k,i}(0)-\alpha_{k}. Under additional regularity conditions in the Appendix,

Nk(τ^kτk)/vk1/2dN(0,1),superscript𝑑subscript𝑁𝑘subscript^𝜏𝑘subscript𝜏𝑘superscriptsubscript𝑣𝑘12𝑁01\sqrt{N_{k}}(\widehat{\tau}_{k}-\tau_{k})/v_{k}^{1/2}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1),

where

vksubscript𝑣𝑘\displaystyle v_{k} =1nki=1nk(uk,i2(1)μk+uk,i2(0)1μk)absent1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘\displaystyle=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}
pk1nki=1nk(uk,i(1)uk,i(0))2pkσk21nki=1nk(uk,i(1)μk+uk,i(0)1μk)2subscript𝑝𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02subscript𝑝𝑘superscriptsubscript𝜎𝑘21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle-p_{k}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}-p_{k}\sigma_{k}^{2}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}
+pk(1qk)1nkm=1mk(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2subscript𝑝𝑘1subscript𝑞𝑘1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle+p_{k}(1-q_{k})\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}
+pkσk21nkm=1mk(i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2.subscript𝑝𝑘subscriptsuperscript𝜎2𝑘1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle+p_{k}\sigma^{2}_{k}\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}. (2)

The expression for the variance vksubscript𝑣𝑘v_{k} has multiple terms that make its interpretation challenging. We first interpret vksubscript𝑣𝑘v_{k} in some special cases to highlight the implications of clustered sampling and clustered assignment. In Section 3.3, we compare vksubscript𝑣𝑘v_{k} to the large-k𝑘k form of the robust and cluster variance estimators.

For the case of random sampling (qk=1subscript𝑞𝑘1q_{k}=1) and random assignment (σk2=0superscriptsubscript𝜎𝑘20\sigma_{k}^{2}=0), the variance simplifies to

1nki=1nk(uk,i2(1)μk+uk,i2(0)1μk)pk1nki=1nk(uk,i(1)uk,i(0))2.1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘subscript𝑝𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}-p_{k}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}.

As we show in Section 3.2 below, the first term in this variance is estimated by the robust variance estimator. The second term is a finite sample correction that is familiar from the literature on randomized experiments [e.g., neyman1923, imbens2015causal, abadie2020sampling]. This finite sample correction vanishes if there is either no heterogeneity in the treatment effects (so uk,i(1)uk,i(0)=yk,i(1)yk,i(0)τk=0subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖0subscript𝑦𝑘𝑖1subscript𝑦𝑘𝑖0subscript𝜏𝑘0u_{k,i}(1)-u_{k,i}(0)=y_{k,i}(1)-y_{k,i}(0)-\tau_{k}=0), or if the sample is a small fraction of the population (pk0subscript𝑝𝑘0p_{k}\approx 0).

Adding clustered sampling, qk<1subscript𝑞𝑘1q_{k}<1, increases the variance by

pk(1qk)1nkm=1mk(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2,subscript𝑝𝑘1subscript𝑞𝑘1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02p_{k}(1-q_{k})\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2},

which is the same as

pk(1qk)1nkm=1mknk,m2(τk,mτk)2.subscript𝑝𝑘1subscript𝑞𝑘1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚2superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2p_{k}(1-q_{k})\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}n_{k,m}^{2}(\tau_{k,m}-\tau_{k})^{2}.

This term vanishes if there is no heterogeneity in the average treatment effect across clusters. Although the sample is informative about heterogeneity in cluster average treatment effects, it is not informative about the value of qksubscript𝑞𝑘q_{k}. Information about the need to adjust for clustered sampling (qk<1subscript𝑞𝑘1q_{k}<1) must come from outside the sample.

Clustered assignment, σk2>0subscriptsuperscript𝜎2𝑘0\sigma^{2}_{k}>0, adds two terms to the variance,

pkσk21nki=1nk(uk,i(1)μk+uk,i(0)1μk)2+pkσk21nkm=1mk(i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2.subscript𝑝𝑘superscriptsubscript𝜎𝑘21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2subscript𝑝𝑘subscriptsuperscript𝜎2𝑘1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2-p_{k}\sigma_{k}^{2}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}\!+p_{k}\sigma^{2}_{k}\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}.

As we explain in more detail in section 3.3, the sign of this expression depends on the amount of variation in potential outcomes that can be explained by the clusters. Note that in contrast to the lack of sample information about the need to adjust for clustered sampling, the sample is potentially informative about the need to account for clustered assignment.

The five terms making up the asymptotic variance vksubscript𝑣𝑘v_{k} can be of different order. The first term is an average of bounded terms, and so under our assumptions will be of order 𝒪(1)𝒪1\mathcal{O}(1). The second and third terms will be at most of the same order as the first one. If pk0subscript𝑝𝑘0p_{k}\approx 0 so we can think of the sample as small relative to the population of sampled clusters, the first term dominates the second and third terms. If cluster sizes are bounded as k𝑘k increases, the fourth and fifth terms in are also order 𝒪(1)𝒪1\mathcal{O}(1). If, on the other hand, cluster sizes increase with k𝑘k, these terms can be of higher order and dominate the variance. Whether they do so or not depends on the (i) magnitude of pksubscript𝑝𝑘p_{k}, (ii) presence of clustering in sampling, (iii) presence of clustering in assignment, and (iv) heterogeneity in potential outcomes.

3.2 The Robust and Cluster Robust Variance Estimators

Let U^k,i=Yk,iα^kτ^kWk,isubscript^𝑈𝑘𝑖subscript𝑌𝑘𝑖subscript^𝛼𝑘subscript^𝜏𝑘subscript𝑊𝑘𝑖\widehat{U}_{k,i}=Y_{k,i}-\widehat{\alpha}_{k}-\widehat{\tau}_{k}W_{k,i} be the residuals from the regression of Yk,isubscript𝑌𝑘𝑖Y_{k,i} or a constant and Wk,isubscript𝑊𝑘𝑖W_{k,i}. Here, α^ksubscript^𝛼𝑘\widehat{\alpha}_{k} is the intercept of the regression and τ^ksubscript^𝜏𝑘\widehat{\tau}_{k} is the coefficient on Wk,isubscript𝑊𝑘𝑖W_{k,i} (equal to the expression in (1) with probability approaching one).

There are two common estimators of the variance of Nk(τ^kτk)subscript𝑁𝑘subscript^𝜏𝑘subscript𝜏𝑘\sqrt{N_{k}}(\widehat{\tau}_{k}-\tau_{k}). First, the conventional robust variance estimator (eicker1963, huber1967behavior, white1980heteroskedasticity):

V^krobust=1 Wk2(1 Wk)2{1Nki=1nkRk,iU^k,i2(Wk,i Wk)2},subscriptsuperscript^𝑉robust𝑘1superscriptsubscript W𝑘2superscript1subscript W𝑘21subscript𝑁𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript^𝑈𝑘𝑖2superscriptsubscript𝑊𝑘𝑖subscript W𝑘2\widehat{V}^{\rm robust}_{k}=\frac{1}{\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k}^{2}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})^{2}}\left\{\frac{1}{N_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widehat{U}_{k,i}^{2}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})^{2}\right\}, (3)

where

 Wk=1Nk1i=1nkRk,iWk,i.subscript W𝑘1subscript𝑁𝑘1superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k}=\frac{1}{N_{k}\vee 1}\sum_{i=1}^{n_{k}}R_{k,i}W_{k,i}.

Let

vkrobust=1nki=1nk(uk,i2(1)μk+uk,i2(0)1μk).subscriptsuperscript𝑣robust𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘v^{\rm robust}_{k}=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}.

Under regularity conditions (see appendix), V^krobustsuperscriptsubscript^𝑉𝑘robust\widehat{V}_{k}^{\rm{robust}} and vkrobustsuperscriptsubscript𝑣𝑘robustv_{k}^{\rm robust} are close in the following sense,

V^krobustvk=vkrobustvk+\scaleto𝒪5ptp(1),superscriptsubscript^𝑉𝑘robustsubscript𝑣𝑘superscriptsubscript𝑣𝑘robustsubscript𝑣𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{\widehat{V}_{k}^{\rm{robust}}}{v_{k}}=\frac{v_{k}^{\rm robust}}{v_{k}}+\scaleto{\mathcal{O}}{5pt}_{p}(1),

motivating our focus on the comparison of vkrobustsuperscriptsubscript𝑣𝑘robustv_{k}^{\rm robust} and vksubscript𝑣𝑘v_{k}. In general the difference vkrobustvksubscriptsuperscript𝑣robust𝑘subscript𝑣𝑘v^{\rm robust}_{k}-v_{k} can be positive or negative, so the robust variance estimator can be invalid in large samples.

The second common variance estimator is the cluster variance [liang1986longitudinal, arellano1987practitioners],

V^kcluster=1 Wk2(1 Wk)2{1Nkm=1mk(i=1nk1{mk,i=m}Rk,iU^k,i(Wk,i Wk))2}.subscriptsuperscript^𝑉cluster𝑘1superscriptsubscript W𝑘2superscript1subscript W𝑘21subscript𝑁𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript^𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘2\widehat{V}^{\rm cluster}_{k}=\frac{1}{\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k}^{2}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})^{2}}\left\{\frac{1}{N_{k}}\sum_{m=1}^{m_{k}}\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widehat{U}_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})\right)^{2}\right\}. (4)

Define

vkclustersubscriptsuperscript𝑣cluster𝑘\displaystyle v^{\rm cluster}_{k} =1nki=1nk(uk,i2(1)μk+uk,i2(0)1μk)absent1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘\displaystyle=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}
pk1nki=1nk(uk,i(1)uk,i(0))2pkσk21nki=1nk(uk,i(1)μk+uk,i(0)1μk)2subscript𝑝𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02subscript𝑝𝑘superscriptsubscript𝜎𝑘21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle-p_{k}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}-p_{k}\sigma_{k}^{2}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}
+pk1nkm=1mk(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2subscript𝑝𝑘1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle+p_{k}\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}
+pkσk21nkm=1mk(i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2.subscript𝑝𝑘subscriptsuperscript𝜎2𝑘1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle+p_{k}\sigma^{2}_{k}\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}.

Then, V^kclustersuperscriptsubscript^𝑉𝑘cluster\widehat{V}_{k}^{\rm{cluster}} is close to vkclustersuperscriptsubscript𝑣𝑘clusterv_{k}^{\rm cluster} in the sense that

V^kclustervk=vkclustervk+\scaleto𝒪5ptp(1).superscriptsubscript^𝑉𝑘clustersubscript𝑣𝑘superscriptsubscript𝑣𝑘clustersubscript𝑣𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{\widehat{V}_{k}^{\rm{cluster}}}{v_{k}}=\frac{v_{k}^{\rm cluster}}{v_{k}}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

The difference vkclustervksubscriptsuperscript𝑣cluster𝑘subscript𝑣𝑘v^{\rm cluster}_{k}-v_{k} is always nonnegative. Therefore, for large k𝑘k, the cluster variance estimator can be conservative but cannot underestimate the variance of τ^ksubscript^𝜏𝑘\widehat{\tau}_{k}.

3.3 Discussion

From the formulas for vksubscript𝑣𝑘v_{k}, vkrobustsuperscriptsubscript𝑣𝑘robustv_{k}^{\rm{robust}}, and vkclustersuperscriptsubscript𝑣𝑘clusterv_{k}^{\rm{cluster}} it follows that if pksubscript𝑝𝑘p_{k} is small enough, then vkrobustsuperscriptsubscript𝑣𝑘robustv_{k}^{\rm{robust}} and vkclustersuperscriptsubscript𝑣𝑘clusterv_{k}^{\rm{cluster}} are approximately equal to vksubscript𝑣𝑘v_{k}. In this case, clustered sampling and clustered assignment do not matter much because the probability that two sample units belong to the same cluster is small.

The difference vkrobustvksuperscriptsubscript𝑣𝑘robustsubscript𝑣𝑘v_{k}^{\rm{robust}}-v_{k} depends on two terms. The first term,

pk1nk[i=1nk(uk,i(1)uk,i(0))2(1qk)m=1mknk,m2(τk,mτk)2],subscript𝑝𝑘1subscript𝑛𝑘delimited-[]superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖021subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚2superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2p_{k}\frac{1}{n_{k}}\Bigg{[}\sum_{i=1}^{n_{k}}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}-(1-q_{k})\sum_{m=1}^{m_{k}}n_{k,m}^{2}(\tau_{k,m}-\tau_{k})^{2}\Bigg{]}, (5)

is equal to zero when treatment effects are constant (in which case, uk,i(1)uk,i(0)=0subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖00u_{k,i}(1)-u_{k,i}(0)=0 for i=1,,nk𝑖1subscript𝑛𝑘i=1,\ldots,n_{k} and τk,mτk=0subscript𝜏𝑘𝑚subscript𝜏𝑘0\tau_{k,m}-\tau_{k}=0 for all m=1,,mk𝑚1subscript𝑚𝑘m=1,\ldots,m_{k}). If all clusters are sampled, so qk=1subscript𝑞𝑘1q_{k}=1, and treatment effects are heterogeneous, (5) is positive. When only a fraction of the clusters are sampled, qk<1subscript𝑞𝑘1q_{k}<1, the sign of (5) depends on the extent to which heterogeneity in treatment effects can be explained by the clusters. If there is no variation in average treatment effects across clusters, the expression in (5) is non-negative. However, when clusters explain much of the variation in treatment effects, the expression in (5) can be negative and very large in magnitude because of the factor nk,m2superscriptsubscript𝑛𝑘𝑚2n_{k,m}^{2}. The second term of vkrobustvksuperscriptsubscript𝑣𝑘robustsubscript𝑣𝑘v_{k}^{\rm{robust}}-v_{k} is equal to

pkσk2m=1mknk,mnk[1nk,mi=1nk1{mk,i\displaystyle p_{k}\sigma^{2}_{k}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}\Bigg{[}\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i} =m}(uk,i(1)μk+uk,i(0)1μk)2\displaystyle=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}
nk,m(1nk,mi=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2].\displaystyle-n_{k,m}\Bigg{(}\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}\Bigg{]}. (6)

This term is equal to zero if there is no clustered assignment, that is, σk2=0subscriptsuperscript𝜎2𝑘0\sigma^{2}_{k}=0. If σk2>0subscriptsuperscript𝜎2𝑘0\sigma^{2}_{k}>0, the sign of (6) depends on how much of the heterogeneity in potential outcomes is explained by the clusters. The expression in (6) is close to zero when there is little heterogeneity in potential outcomes, so uk,i(1)subscript𝑢𝑘𝑖1u_{k,i}(1) and uk,i(0)subscript𝑢𝑘𝑖0u_{k,i}(0) are close to zero. If there is heterogeneity in potential outcomes but average potential outcomes are nearly constant across clusters, (6) is positive. When the clusters explain enough heterogeneity in potential outcomes (6) can be negative and potentially very large in magnitude because of the factor nk,msubscript𝑛𝑘𝑚n_{k,m} multiplying the second term of the sum in (6). That is, the robust variance formula can severely underestimate the variance of τ^ksubscript^𝜏𝑘\widehat{\tau}_{k}.

Clustered standard errors are conservative in general, that is, vkclustervksuperscriptsubscript𝑣𝑘clustersubscript𝑣𝑘v_{k}^{\rm{cluster}}\geq v_{k}. In particular, the difference vkclustervksuperscriptsubscript𝑣𝑘clustersubscript𝑣𝑘v_{k}^{\rm{cluster}}-v_{k} is

vkclustervk=pkqk1nkm=1mk(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2,superscriptsubscript𝑣𝑘clustersubscript𝑣𝑘subscript𝑝𝑘subscript𝑞𝑘1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02v_{k}^{\rm{cluster}}-v_{k}=p_{k}q_{k}\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2},

which can be rewritten as

vkclustervk=(pknkmk)qk{1mkm=1mk(nk,mmknk)2(τk,mτk)2}.superscriptsubscript𝑣𝑘clustersubscript𝑣𝑘subscript𝑝𝑘subscript𝑛𝑘subscript𝑚𝑘subscript𝑞𝑘1subscript𝑚𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚subscript𝑚𝑘subscript𝑛𝑘2superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2v_{k}^{\rm{cluster}}-v_{k}=\left(\frac{p_{k}n_{k}}{m_{k}}\right)q_{k}\left\{\frac{1}{m_{k}}\sum_{m=1}^{m_{k}}\left(\frac{n_{k,m}m_{k}}{n_{k}}\right)^{2}(\tau_{k,m}-\tau_{k})^{2}\right\}. (7)

When the expected fraction of clusters in the sample, qksubscript𝑞𝑘q_{k}, is small, or when the average treatment effect is nearly constant between clusters, then vkclustervksuperscriptsubscript𝑣𝑘clustersubscript𝑣𝑘v_{k}^{\rm{cluster}}\approx v_{k}. Aside from these special cases, the pknk/mksubscript𝑝𝑘subscript𝑛𝑘subscript𝑚𝑘p_{k}n_{k}/m_{k} factor in the formula above indicates that cluster standard errors can be extremely conservative in general.

4 Two New Variance Estimators

Estimation of the variance of τ^ksubscript^𝜏𝑘\widehat{\tau}_{k} is challenging because the different terms in vksubscript𝑣𝑘v_{k} can be of different orders of magnitude. In this section, we propose two estimators of the variance of τ^ksubscript^𝜏𝑘\widehat{\tau}_{k} that allow us to correct the bias of the cluster variance estimator, one analytic, and one resampling-based. As the expression for the bias of the cluster variance in (7) shows, the cluster variance is heavily biased if the fraction of the sampled clusters is large and there is substantial variation in the cluster-specific treatment effects. Although the proposed analytic variance estimator is defined irrespective of the value of σk2superscriptsubscript𝜎𝑘2\sigma_{k}^{2}, in order to for the correction to be effective we need to be able to estimate the cluster-specific treatment effects, and thus we need σk2superscriptsubscript𝜎𝑘2\sigma_{k}^{2} to be less than its maximum value of μk(1μk)subscript𝜇𝑘1subscript𝜇𝑘\mu_{k}(1-\mu_{k}) to ensure that there is variation in the treatment assignment within clusters. One of the proposed variance estimators is based on a correction to V^kclustersubscriptsuperscript^𝑉cluster𝑘\widehat{V}^{\rm cluster}_{k}, and the other is based on resampling methods. An alternative would be to directly estimate the bias term in (7) and subtract that from the cluster variance. A challenge with this approach is that the estimation error for the adjustment term is large (often leading to negative variances estimates) because the order of magnitude of the correction is itself large and this approach did not work well in our simulations. We do not report formal results for the variance estimators in the current paper. We demonstrate their performance in the simulations in Section 6. There may well be further refinements possible.

If qksubscript𝑞𝑘q_{k} is close to zero, the proposed variance estimators are close to V^kclustersubscriptsuperscript^𝑉cluster𝑘\widehat{V}^{\rm cluster}_{k}, which has little bias in that case. If σk2=μk(1μk)superscriptsubscript𝜎𝑘2subscript𝜇𝑘1subscript𝜇𝑘\sigma_{k}^{2}=\mu_{k}(1-\mu_{k}) (that is, when Wk,isubscript𝑊𝑘𝑖W_{k,i} is constant within clusters), the proposed resampling variance estimator is not defined. To be effective both variance estimators rely on estimating the variation in treatment effects across clusters, and therefore require a substantial number of both treated and control observations per cluster. The proposed variance estimators lead to substantial improvements over V^kclustersubscriptsuperscript^𝑉cluster𝑘\widehat{V}^{\rm cluster}_{k} in cases where V^kclustersubscriptsuperscript^𝑉cluster𝑘\widehat{V}^{\rm cluster}_{k} has a large upward bias. The downside of the proposed variance estimators is that they can be conservative when there is no need to cluster because there is no heterogeneity in treatment effects, or when there are too few treated and control observations per cluster to estimate the heterogeneity in the treatment effects precisely.

We first consider in Section 4.1 the case with qk=1subscript𝑞𝑘1q_{k}=1 so we have random sampling. Next we consider in Section 4.2 the case with clustered sampling qk<1subscript𝑞𝑘1q_{k}<1. In Section 4.3 we propose a bootstrap procedure for estimating the variance. The proposed variance estimators perform very well in the simulation study of Section 6. The derivation of their formal properties is left for future work.

4.1 The Case with All Clusters Observed

First we focus on the case with qk=1subscript𝑞𝑘1q_{k}=1 (all clusters observed), but allowing for general pksubscript𝑝𝑘p_{k}. Let Uk,i=Wk,iuk,i(1)+(1Wk,i)uk,i(0)subscript𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript𝑢𝑘𝑖11subscript𝑊𝑘𝑖subscript𝑢𝑘𝑖0U_{k,i}=W_{k,i}u_{k,i}(1)+(1-W_{k,i})u_{k,i}(0). The first step is to approximate the normalized error of the least squares estimator τ^ksubscript^𝜏𝑘\widehat{\tau}_{k} by a normalized sample average over clusters,

Nk(τ^kτk)/vk1/2=1nkpkvkμk(1μk)m=1mkCk,m+\scaleto𝒪5ptp(1),subscript𝑁𝑘subscript^𝜏𝑘subscript𝜏𝑘superscriptsubscript𝑣𝑘121subscript𝑛𝑘subscript𝑝𝑘subscript𝑣𝑘subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝐶𝑘𝑚\scaleto𝒪5𝑝subscript𝑡𝑝1\sqrt{N_{k}}(\widehat{\tau}_{k}-\tau_{k})/v_{k}^{1/2}=\frac{1}{\sqrt{n_{k}p_{k}v_{k}}\mu_{k}(1-\mu_{k})}\sum_{m=1}^{m_{k}}C_{k,m}+\scaleto{\mathcal{O}}{5pt}_{p}(1), (8)

where the terms

Ck,m=i=1nk1{mk,i=m}Rk,i(Wk,iμk)Uk,isubscript𝐶𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑈𝑘𝑖C_{k,m}=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\mu_{k})U_{k,i}

are independent across clusters. In the appendix, we show

V^kcluster/vk=1nkpkvk(1μk(1μk))2m=1mkCk,m2+\scaleto𝒪5ptp(1).superscriptsubscript^𝑉𝑘clustersubscript𝑣𝑘1subscript𝑛𝑘subscript𝑝𝑘subscript𝑣𝑘superscript1subscript𝜇𝑘1subscript𝜇𝑘2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝐶𝑘𝑚2\scaleto𝒪5𝑝subscript𝑡𝑝1\displaystyle\widehat{V}_{k}^{\rm{cluster}}/v_{k}=\frac{1}{n_{k}p_{k}v_{k}}\left(\frac{1}{\mu_{k}(1-\mu_{k})}\right)^{2}\sum_{m=1}^{m_{k}}C_{k,m}^{2}+\scaleto{\mathcal{O}}{5pt}_{p}(1). (9)

The expectation of Cm,ksubscript𝐶𝑚𝑘C_{m,k} is

E[Ck,m]=nk,mpkμk(1μk)(τk,mτk),𝐸delimited-[]subscript𝐶𝑘𝑚subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝜇𝑘1subscript𝜇𝑘subscript𝜏𝑘𝑚subscript𝜏𝑘E[C_{k,m}]=n_{k,m}p_{k}\mu_{k}(1-\mu_{k})(\tau_{k,m}-\tau_{k}),

with sum over clusters

m=1mkE[Ck,m]=pkμk(1μk)m=1mknk,m(τk,mτk)=0.superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscript𝐶𝑘𝑚subscript𝑝𝑘subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝜏𝑘𝑚subscript𝜏𝑘0\sum_{m=1}^{m_{k}}E[C_{k,m}]=p_{k}\mu_{k}(1-\mu_{k})\sum_{m=1}^{m_{k}}n_{k,m}(\tau_{k,m}-\tau_{k})=0. (10)

That is, although the sum of the expectations of Ck,msubscript𝐶𝑘𝑚C_{k,m} over clusters is equal to zero, these expectations are not equal to zero in general for each cluster separately. Because var(Ck,m)E[Ck,m2]varsubscript𝐶𝑘𝑚𝐸delimited-[]superscriptsubscript𝐶𝑘𝑚2\mbox{var}(C_{k,m})\leq E[C_{k,m}^{2}], the first term on the right-hand side of (9) is conservative on expectation relative to the variance of Nk(τ^kτk)/vk1/2subscript𝑁𝑘subscript^𝜏𝑘subscript𝜏𝑘superscriptsubscript𝑣𝑘12\sqrt{N_{k}}(\widehat{\tau}_{k}-\tau_{k})/v_{k}^{1/2}, which explains the conservativeness of V^kclustersuperscriptsubscript^𝑉𝑘cluster\widehat{V}_{k}^{\rm{cluster}}.

Because of (10), we can replace the terms Ck,msubscript𝐶𝑘𝑚C_{k,m} in (8) by Ck,mE[Ck,m]=Ck,m,1+Ck,m,2subscript𝐶𝑘𝑚𝐸delimited-[]subscript𝐶𝑘𝑚subscript𝐶𝑘𝑚1subscript𝐶𝑘𝑚2C_{k,m}-E[C_{k,m}]=C_{k,m,1}+C_{k,m,2}, where

Ck,m,1subscript𝐶𝑘𝑚1\displaystyle C_{k,m,1} =i=1nk1{mk,i=m}(Rk,ipk)(τk,mτk)μk(1μk),absentsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑝𝑘subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝜇𝑘1subscript𝜇𝑘\displaystyle=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(R_{k,i}-p_{k})(\tau_{k,m}-\tau_{k})\mu_{k}(1-\mu_{k}),
and
Ck,m,2subscript𝐶𝑘𝑚2\displaystyle C_{k,m,2} =i=1nk1{mk,i=m}Rk,i((Wk,iμk)Uk,i(τk,mτk)μk(1μk)).absentsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑈𝑘𝑖subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝜇𝑘1subscript𝜇𝑘\displaystyle=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Bigl{(}(W_{k,i}-\mu_{k})U_{k,i}-(\tau_{k,m}-\tau_{k})\mu_{k}(1-\mu_{k})\Bigr{)}.

Therefore,

Nk(τ^kτk)/vk1/2=1nkpkvkμk(1μk)(m=1mkCk,m,1+m=1mkCk,m,2)+\scaleto𝒪5ptp(1).subscript𝑁𝑘subscript^𝜏𝑘subscript𝜏𝑘superscriptsubscript𝑣𝑘121subscript𝑛𝑘subscript𝑝𝑘subscript𝑣𝑘subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝐶𝑘𝑚1superscriptsubscript𝑚1subscript𝑚𝑘subscript𝐶𝑘𝑚2\scaleto𝒪5𝑝subscript𝑡𝑝1\sqrt{N_{k}}(\widehat{\tau}_{k}-\tau_{k})/v_{k}^{1/2}=\frac{1}{\sqrt{n_{k}p_{k}v_{k}}\mu_{k}(1-\mu_{k})}\left(\sum_{m=1}^{m_{k}}C_{k,m,1}+\sum_{m=1}^{m_{k}}C_{k,m,2}\right)+\scaleto{\mathcal{O}}{5pt}_{p}(1). (11)

It can be shown that Ck,m,1subscript𝐶𝑘𝑚1C_{k,m,1} and Ck,m,2subscript𝐶𝑘𝑚2C_{k,m,2} have means equal to zero and are uncorrelated. In addition, Ck,m,1subscript𝐶𝑘𝑚1C_{k,m,1} and Ck,m,2subscript𝐶𝑘𝑚2C_{k,m,2} are uncorrelated across clusters. The variance of m=1mkCk,m,1/(nkpkμk(1μk))superscriptsubscript𝑚1subscript𝑚𝑘subscript𝐶𝑘𝑚1subscript𝑛𝑘subscript𝑝𝑘subscript𝜇𝑘1subscript𝜇𝑘\sum_{m=1}^{m_{k}}C_{k,m,1}/(\sqrt{n_{k}p_{k}}\mu_{k}(1-\mu_{k})) is

(1pk)m=1mknk,mnk(τk,mτk)2.1subscript𝑝𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2(1-p_{k})\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

Let τ^k,msubscript^𝜏𝑘𝑚\widehat{\tau}_{k,m} be difference between the sample average of the outcome for treated and nontreated units in cluster m𝑚m. A direct estimator the variance of m=1mkCk,m,2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝐶𝑘𝑚2\sum_{m=1}^{m_{k}}C_{k,m,2} is

m=1mk(i=1nk1{mk,i=m}Rk,i((Wk,i Wk)U^k,i(τ^k,mτ^k) Wk(1 Wk)))2,superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘subscript^𝑈𝑘𝑖subscript^𝜏𝑘𝑚subscript^𝜏𝑘subscript W𝑘1subscript W𝑘2\sum_{m=1}^{m_{k}}\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Bigl{(}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})\widehat{U}_{k,i}-(\widehat{\tau}_{k,m}-\widehat{\tau}_{k})\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})\Bigr{)}\right)^{2}, (12)

In practice, the estimator in (12) is biased from the correlations between the estimation errors of its components. We apply sampling splitting to address this bias. We first split the sample randomly into two subsamples. Let Zk,i{0,1}subscript𝑍𝑘𝑖01{Z}_{k,i}\in\{0,1\} be the indicator that unit i𝑖i belongs to the second subsample, and let Z¯ksubscript¯𝑍𝑘\overline{Z}_{k} be the mean of Zk,isubscript𝑍𝑘𝑖Z_{k,i}. Using the subsample with Zk,i=0subscript𝑍𝑘𝑖0{Z}_{k,i}=0, we obtain estimates τ^k,msuperscriptsubscript^𝜏𝑘𝑚\widehat{\tau}_{k,m}^{\,*}, α^ksuperscriptsubscript^𝛼𝑘\widehat{\alpha}_{k}^{\,*}, and τ^ksuperscriptsubscript^𝜏𝑘\widehat{\tau}_{k}^{\,*} of τk,msubscript𝜏𝑘𝑚\tau_{k,m}, αksubscript𝛼𝑘\alpha_{k}, and τksubscript𝜏𝑘\tau_{k}, respectively. Next, for observations with Zk,i=1subscript𝑍𝑘𝑖1Z_{k,i}=1, we calculate the residuals U^k,i=Yk,iα^kτ^kWk,isuperscriptsubscript^𝑈𝑘𝑖subscript𝑌𝑘𝑖superscriptsubscript^𝛼𝑘superscriptsubscript^𝜏𝑘subscript𝑊𝑘𝑖\widehat{U}_{k,i}^{\,*}=Y_{k,i}-\widehat{\alpha}_{k}^{\,*}-\widehat{\tau}_{k}^{\,*}W_{k,i}. Finally, we estimate the normalized variance for the case with qk=1subscript𝑞𝑘1q_{k}=1 as

V^kCCV(1)subscriptsuperscript^𝑉CCV𝑘1\displaystyle\widehat{V}^{\rm CCV}_{k}(1) =1Nk Wk2(1 Wk)2m=1mk[1 Zk2(i=1nk1{mk,i=m}Rk,iZk,i((Wk,i Wk)U^k,i\displaystyle=\frac{1}{N_{k}\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k}^{2}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})^{2}}\sum_{m=1}^{m_{k}}\Bigg{[}\frac{1}{\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$Z$\kern-0.20004pt}}}_{k}^{2}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}Z_{k,i}\Big{(}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})\widehat{U}_{k,i}^{\,*}
(τ^k,mτ^k) Wk(1 Wk)))2\displaystyle-(\widehat{\tau}_{k,m}^{\,*}-\widehat{\tau}_{k}^{\,*})\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})\Big{)}\Bigg{)}^{2}
1 Zk Zk2i=1nk1{mk,i=m}Rk,iZk,i((Wk,i Wk)U^k,i(τ^k,mτ^k) Wk(1 Wk))2]\displaystyle-\frac{1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$Z$\kern-0.20004pt}}}_{k}}{\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$Z$\kern-0.20004pt}}}_{k}^{2}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}Z_{k,i}\Big{(}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})\widehat{U}_{k,i}^{\,*}-(\widehat{\tau}_{k,m}^{\,*}-\widehat{\tau}_{k}^{\,*})\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k})\Big{)}^{2}\Bigg{]}
+(1pk)m=1mk Nk,mNk(τ^k,mτ^k)2,1subscript𝑝𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript N𝑘𝑚subscript𝑁𝑘superscriptsubscript^𝜏𝑘𝑚subscript^𝜏𝑘2\displaystyle+(1-p_{k})\sum_{m=1}^{m_{k}}\frac{\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$N$\kern-0.20004pt}}}_{k,m}}{N_{k}}(\widehat{\tau}_{k,m}-\widehat{\tau}_{k})^{2}, (13)

where  Nk,msubscript N𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$N$\kern-0.20004pt}}}_{k,m} is the size of the sample in cluster m𝑚m. For clusters with no variation in the treatment variable, we replace τ^k,msubscript^𝜏𝑘𝑚\widehat{\tau}_{k,m} in (13) with τ^ksubscript^𝜏𝑘\widehat{\tau}_{k}. For clusters with no variation in the treatment variable for a particular subsample, we replace τ^k,msuperscriptsubscript^𝜏𝑘𝑚\widehat{\tau}_{k,m}^{*} in (13) with τ^ksuperscriptsubscript^𝜏𝑘\widehat{\tau}_{k}^{*}. We derive the form of the CCV estimator in the appendix. To improve the precision of V^kCCV(1)subscriptsuperscript^𝑉CCV𝑘1\widehat{V}^{\rm CCV}_{k}(1), we re-estimate it multiple times with new sample splits (new values for Zk,isubscript𝑍𝑘𝑖Z_{k,i}) and then average the corresponding variance estimators. In our simulations of section 6, we re-estimate the variance estimator four times, and use sample splits with in expectation an equal number of units in each subsample, so E[Z¯k]=1/2𝐸delimited-[]subscript¯𝑍𝑘12E[\overline{Z}_{k}]=1/2.

4.2 The Case When Not All Clusters Are Sampled

To motivate the modification of the variance estimator V^kCCV(1)superscriptsubscript^𝑉𝑘CCV1\widehat{V}_{k}^{\rm CCV}(1) for the qk<1subscript𝑞𝑘1q_{k}<1 case, notice that

vk(qk)vkcluster=qk×(vk(1)vkcluster),subscript𝑣𝑘subscript𝑞𝑘subscriptsuperscript𝑣cluster𝑘subscript𝑞𝑘subscript𝑣𝑘1subscriptsuperscript𝑣cluster𝑘v_{k}(q_{k})-v^{\rm cluster}_{k}=q_{k}\times(v_{k}(1)-v^{\rm cluster}_{k}),

where vk(qk)subscript𝑣𝑘subscript𝑞𝑘v_{k}(q_{k}) denotes the value of the true variance vksubscript𝑣𝑘v_{k} evaluated at qksubscript𝑞𝑘q_{k}. That is, the variance for the general qksubscript𝑞𝑘q_{k} case is a convex combination of the true variance at qk=1subscript𝑞𝑘1q_{k}=1 and the cluster variance,

vk(qk)=qk×vk(1)+(1qk)×vkcluster.subscript𝑣𝑘subscript𝑞𝑘subscript𝑞𝑘subscript𝑣𝑘11subscript𝑞𝑘subscriptsuperscript𝑣cluster𝑘v_{k}(q_{k})=q_{k}\times v_{k}(1)+(1-q_{k})\times v^{\rm cluster}_{k}.

Let q^ksubscript^𝑞𝑘\widehat{q}_{k} be the ratio between the number of sampled clusters and the total number of clusters in the population. The proposed variance estimator, V^kCCVsubscriptsuperscript^𝑉CCV𝑘\widehat{V}^{\rm CCV}_{k}, is a convex combination of V^kCCV(1)subscriptsuperscript^𝑉CCV𝑘1\widehat{V}^{\rm CCV}_{k}(1) and V^kclustersubscriptsuperscript^𝑉cluster𝑘\widehat{V}^{\rm cluster}_{k} with weights q^ksubscript^𝑞𝑘\widehat{q}_{k} and 1q^k1subscript^𝑞𝑘1-\widehat{q}_{k},

V^kCCV=q^k×V^kCCV(1)+(1q^k)×V^kcluster.subscriptsuperscript^𝑉CCV𝑘subscript^𝑞𝑘subscriptsuperscript^𝑉CCV𝑘11subscript^𝑞𝑘subscriptsuperscript^𝑉cluster𝑘\widehat{V}^{\rm CCV}_{k}=\widehat{q}_{k}\times\widehat{V}^{\rm CCV}_{k}(1)+(1-\widehat{q}_{k})\times\widehat{V}^{\rm cluster}_{k}. (14)

Computation of q^ksubscript^𝑞𝑘\widehat{q}_{k} requires knowledge of mksubscript𝑚𝑘m_{k}, the total number of clusters in the population.

4.3 A Bootstrap Variance Estimator

In the previous sections, we have discussed an analytic variance estimator. Here we suggest a resampling-based variance estimator, initially for the case with qk=1subscript𝑞𝑘1q_{k}=1. Like the causal bootstrap in imbens2021causal, the proposed bootstrap procedure takes into account the causal nature of the estimand and creates bootstrap samples where units (in this case clusters) have different assignments and assignment probabilities than they have in the original sample. It differs from earlier bootstrap variance estimators for clustered settings [e.g., cameron2015practitioner, menzel2021bootstrap] in that it allows for the possibility that a large fraction of clusters are observed.

The specific resampling procedure, which we call the two-stage-cluster-bootstrap (TSCB), consists of two stages. For each of the clusters, let  Nk,msubscript N𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$N$\kern-0.20004pt}}}_{k,m} be the cluster-level sample size and  Wk,m=Nk,m,1/( Nk,m1)subscript W𝑘𝑚subscript𝑁𝑘𝑚1subscript N𝑘𝑚1\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m}=N_{k,m,1}/(\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$N$\kern-0.20004pt}}}_{k,m}\vee 1) the cluster-level fraction of treated units. In the first stage of the bootstrap procedure, for each cluster we draw W¯k,mbsubscriptsuperscript¯𝑊𝑏𝑘𝑚{\overline{W}}^{\,b}_{k,m} with replacement from the empirical distribution of the cluster-level fractions of treated units, that is with probability 1/mk1subscript𝑚𝑘1/m_{k} from the set {W¯k,1,,W¯k,mk}subscript¯𝑊𝑘1subscript¯𝑊𝑘subscript𝑚𝑘\{{\overline{W}}_{k,1},\ldots,{\overline{W}}_{k,m_{k}}\}. In the second stage, we draw  Nk,mW¯k,mbsubscript N𝑘𝑚subscriptsuperscript¯𝑊𝑏𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$N$\kern-0.20004pt}}}_{k,m}\overline{W}^{\,b}_{k,m} units with replacement from the set of treated units in cluster m𝑚m and  Nk,m(1W¯k,mb)subscript N𝑘𝑚1subscriptsuperscript¯𝑊𝑏𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$N$\kern-0.20004pt}}}_{k,m}(1-\overline{W}^{\,b}_{k,m}) units with replacement from the set of untreated units in cluster m𝑚m. In order for this to be well-defined we do need all the W¯k,1subscript¯𝑊𝑘1{\overline{W}}_{k,1} to be strictly between zero and one. We do this for all clusters to create the bootstrap sample, and calculate the bootstrap standard errors as the standard deviation of the treatment effect estimates across bootstrap iterations.

Next, consider the case with qk<1subscript𝑞𝑘1q_{k}<1. In this case, we need to take into account the fact that we see a fraction of the clusters in the population. We follow the approach proposed in chao1985bootstrap. Suppose q=1/2,𝑞12q=1/2, so we observe half the clusters in the population. The bootstrap procedure first creates a pseudo population consisting of the original population of clusters, plus one additional replica of each cluster. Then, to get a bootstrap sample, we sample randomly, without replacement, from the clusters in this pseudo population. Given the clusters in the bootstrap sample, we proceed as before, and ultimately calculate the bootstrap variance as the variance of the estimator over the bootstrap samples. chao1985bootstrap provide details and extensions to the case for the case where 1/qk1subscript𝑞𝑘1/q_{k} is not an integer.

The algorithm for the TSCB is summarized here.

Algorithm 1 Two Stage Cluster Bootstrap
Input:
      Sample (Yk,i,Wk,i,mk,i)subscript𝑌𝑘𝑖subscript𝑊𝑘𝑖subscript𝑚𝑘𝑖(Y_{k,i},W_{k,i},m_{k,i})
      Fraction sampled clusters qksubscript𝑞𝑘q_{k}
      Number of bootstrap replications B𝐵B
Stage 1:
1a: Create pseudo population by replicating each cluster 1/qk1subscript𝑞𝑘1/q_{k} times
1b: For each cluster in the pseudo population, calculate the assignment probability W¯k,msubscript¯𝑊𝑘𝑚\overline{W}_{k,m}
1c: Create a bootstrap sample of clusters by randomly drawing clusters from the pseudo population from Stage 1a
1d: For each sampled cluster, draw an assignment probability Ak,msubscript𝐴𝑘𝑚A_{k,m} from the empirical distribution of the W¯k,msubscript¯𝑊𝑘𝑚\overline{W}_{k,m} from Stage 1b
Stage 2:
2a: Randomly draw from the set of treated units in cluster m𝑚m, Nk,mAk,msubscript𝑁𝑘𝑚subscript𝐴𝑘𝑚\lfloor N_{k,m}A_{k,m}\rfloor units
2b: Randomly draw from the set of control units in cluster m𝑚m, Nk,m(1Ak,m)subscript𝑁𝑘𝑚1subscript𝐴𝑘𝑚\lfloor N_{k,m}(1-A_{k,m})\rfloor units
Calculations:
For the units in the bootstrap sample constructed in Stage 2, collect the values for (Yk,i,Wk,i,mk,i)subscript𝑌𝑘𝑖subscript𝑊𝑘𝑖subscript𝑚𝑘𝑖(Y_{k,i},W_{k,i},m_{k,i}) and calculate the least squares or fixed effect estimator
Calculate the standard deviation of the least squares or fixed effect estimator over the B𝐵B bootstrap samples

5 The Fixed Effect Estimator

In this section, we report results for the fixed effect estimator often used in empirical research in economics. arellano1987practitioners, Bertrand2004did, cameron2015practitioner and mackinnon2021cluster have pointed out that cluster adjustments may still be necessary in fixed effects regressions. However, a view of clustering based on models with cluster-specific variance components creates ambiguity in the role of clustered standard errors for estimators with cluster fixed effects, which are specifically aimed to absorb cluster-level variation.

We first characterize the fixed effect estimator and derive its large k𝑘k distribution. Then, we discuss the properties of the two conventional variance estimators, the robust and cluster robust variance estimators. As in the least squares case, we find that the robust standard errors may be too small and the cluster standard errors may be unnecessarily large, especially in cases when the number of observations per cluster is large. We propose CCV and TSCB variance estimators. The CCV estimator for fixed effects has a different form than the one for least squares in section A.4.

The fixed effect estimator is based on a regression of the outcome on the treatment indicator and indicators for each of the clusters in the sample. It can be written as the least squares estimate for a regression of the outcome on the treatment, with both variables measured in deviation from cluster means,

τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\displaystyle\widehat{\tau}_{k}^{\rm{\,fixed}} =m=1mki=1nk1{mk,i=m}Rk,iYk,i(Wk,i Wk,m)m=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m).absentsuperscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑌𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=\frac{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}Y_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m})}{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m})}. (15)

Like in section 3, we assume that that potential outcomes are bounded, mkqksubscript𝑚𝑘subscript𝑞𝑘m_{k}q_{k}\rightarrow\infty, and lim supkmaxmnk,m/minmnk,m<subscriptlimit-supremum𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚\limsup_{k\rightarrow\infty}\max_{m}n_{k,m}/\min_{m}n_{k,m}<\infty. In addition, we assume (i) (mkqk)/((pknk)/mk)0subscript𝑚𝑘subscript𝑞𝑘subscript𝑝𝑘subscript𝑛𝑘subscript𝑚𝑘0(m_{k}q_{k})/((p_{k}n_{k})/m_{k})\allowbreak\rightarrow 0, and (ii) the supports of the cluster probabilities, Ak,msubscript𝐴𝑘𝑚A_{k,m}, are bounded away from zero and one (uniformly in k𝑘k and m𝑚m). Assumption (i) restricts the focus of our analysis in this section to settings where the expected number of sampled clusters is small relative to the expected number of sampled observations per sampled cluster. Together with the previous assumptions, assumption (i) implies (pknk)/mksubscript𝑝𝑘subscript𝑛𝑘subscript𝑚𝑘(p_{k}n_{k})/m_{k}\rightarrow\infty, nkpkqksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘n_{k}p_{k}q_{k}\rightarrow\infty, and pkminmnk,msubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚p_{k}\min_{m}n_{k,m}\rightarrow\infty. This last result, along with assumption (ii), ensures that τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\widehat{\tau}_{k}^{\rm{\,fixed}} in (15) is well-defined with probability approaching one.

Let αk,m=(1/nk,m)i=1nk1{mk,i=m}yk,i(0)subscript𝛼𝑘𝑚1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑦𝑘𝑖0\alpha_{k,m}=(1/n_{k,m})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}y_{k,i}(0). For an observation, i𝑖i, with mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m, we define the within-cluster residuals ek,i(0)=yk,i(0)αk,msubscript𝑒𝑘𝑖0subscript𝑦𝑘𝑖0subscript𝛼𝑘𝑚e_{k,i}(0)=y_{k,i}(0)-\alpha_{k,m} and ek,i(1)=yk,i(1)τk,mαk,msubscript𝑒𝑘𝑖1subscript𝑦𝑘𝑖1subscript𝜏𝑘𝑚subscript𝛼𝑘𝑚e_{k,i}(1)=y_{k,i}(1)-\tau_{k,m}-\alpha_{k,m}. Let

v~k=fk/(μk(1μk)σk2)2subscript~𝑣𝑘subscript𝑓𝑘superscriptsubscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘2\tilde{v}_{k}=f_{k}/(\mu_{k}(1-\mu_{k})-\sigma^{2}_{k})^{2} (16)

where

fksubscript𝑓𝑘\displaystyle f_{k} =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖21𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖20\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(1)+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(0)
pkE[Ak,m2(1Ak,m)2]1nki=1nk(ek,i(1)ek,i(0))2subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}
+(E[Ak,m(1Ak,m)](5+pk)E[Ak,m2(1Ak,m)2]\displaystyle+\Big{(}E[A_{k,m}(1-A_{k,m})]-(5+p_{k})E[A^{2}_{k,m}(1-A_{k,m})^{2}]
+2qk(E[Ak,m(1Ak,m)])2)m=1mknk,mnk(τk,mτk)2\displaystyle\qquad\qquad+2q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+(pkE[Ak,m2(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)m=1mknk,m2nk(τk,mτk)2.subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝑛2𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+\Big{(}p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n^{2}_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

Under additional regularity conditions, which are described in the Appendix, we obtain the large k𝑘k distribution of the fixed effects estimator,

Nk(τ^kfixedτk)/v~k1/2dN(0,1).superscript𝑑subscript𝑁𝑘superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘superscriptsubscript~𝑣𝑘12𝑁01\sqrt{N_{k}}(\widehat{\tau}_{k}^{\rm{\,fixed}}-\tau_{k})/\tilde{v}_{k}^{1/2}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1). (17)

Let U~k,i=Y~k,iτ^kfixedW~k,isubscript~𝑈𝑘𝑖subscript~𝑌𝑘𝑖superscriptsubscript^𝜏𝑘fixedsubscript~𝑊𝑘𝑖\widetilde{U}_{k,i}=\widetilde{Y}_{k,i}-\widehat{\tau}_{k}^{\,{\rm fixed}}\widetilde{W}_{k,i}, where Y~k,i=Yk,i Yk,mk,isubscript~𝑌𝑘𝑖subscript𝑌𝑘𝑖subscript Y𝑘subscript𝑚𝑘𝑖\widetilde{Y}_{k,i}=Y_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$Y$\kern-0.20004pt}}}_{k,m_{k,i}}, W~k,i=(Wk,i Wk,mk,i)subscript~𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘subscript𝑚𝑘𝑖\widetilde{W}_{k,i}=(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m_{k,i}}). The robust estimator of the variance of Nk(τ^kfixedτk)subscript𝑁𝑘subscriptsuperscript^𝜏fixed𝑘subscript𝜏𝑘\sqrt{N_{k}}(\widehat{\tau}^{\,{\rm fixed}}_{k}-\tau_{k}) is

V~krobust=1Nki=1nkRk,iW~k,i2U~k,i2/(1Nki=1nkRk,iW~k,i2)2.superscriptsubscript~𝑉𝑘robust1subscript𝑁𝑘superscriptsubscript𝑖1subscript𝑛𝑘/subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscriptsubscript~𝑈𝑘𝑖2superscript1subscript𝑁𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖22\displaystyle\widetilde{V}_{k}^{\rm{robust}}=\left.\frac{1}{N_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\widetilde{U}_{k,i}^{2}\right/\left(\frac{1}{N_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\right)^{2}. (18)

Now let,

v~krobust=fkrobust/(μk(1μk)σk2)2.superscriptsubscript~𝑣𝑘robustsuperscriptsubscript𝑓𝑘robustsuperscriptsubscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘2\tilde{v}_{k}^{\rm robust}=f_{k}^{\rm robust}/(\mu_{k}(1-\mu_{k})-\sigma^{2}_{k})^{2}.

with

fkrobustsuperscriptsubscript𝑓𝑘robust\displaystyle f_{k}^{\rm robust} =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑒2𝑘𝑖1𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑒2𝑘𝑖0\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e^{2}_{k,i}(1)+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e^{2}_{k,i}(0)
+E[Ak,m(1Ak,m)(13Ak,m(1Ak,m))]m=1mknk,mnk(τk,mτk)2.𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚13subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+E[A_{k,m}(1-A_{k,m})(1-3A_{k,m}(1-A_{k,m}))]\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

Notice that all terms of fkrobustsuperscriptsubscript𝑓𝑘robustf_{k}^{\rm robust} are bounded. In the appendix, we show that

V~krobust=v~krobust+\scaleto𝒪5ptp(1).superscriptsubscript~𝑉𝑘robustsuperscriptsubscript~𝑣𝑘robust\scaleto𝒪5𝑝subscript𝑡𝑝1\widetilde{V}_{k}^{\rm{robust}}=\tilde{v}_{k}^{\rm robust}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

The cluster variance estimator for fixed effects is

V~kcluster=1Nkm=1mk(i=1nk1{mk,i=m}Rk,iW~k,iU~k,i)2/(1Nki=1nkRk,iW~k,i2)2.superscriptsubscript~𝑉𝑘cluster1subscript𝑁𝑘superscriptsubscript𝑚1subscript𝑚𝑘/superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript~𝑊𝑘𝑖subscript~𝑈𝑘𝑖2superscript1subscript𝑁𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖22\displaystyle\widetilde{V}_{k}^{\rm{cluster}}=\left.\frac{1}{N_{k}}\sum_{m=1}^{m_{k}}\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}\widetilde{U}_{k,i}\right)^{2}\right/\left(\frac{1}{N_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\right)^{2}. (19)

Let,

v~kcluster=fkcluster/(μk(1μk)σk2)2.superscriptsubscript~𝑣𝑘clustersuperscriptsubscript𝑓𝑘clustersuperscriptsubscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘2\tilde{v}_{k}^{\rm cluster}=f_{k}^{\rm cluster}/(\mu_{k}(1-\mu_{k})-\sigma^{2}_{k})^{2}.

with

fkclustersuperscriptsubscript𝑓𝑘cluster\displaystyle f_{k}^{\rm cluster} =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖21𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖20\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(1)+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(0)
pkE[Ak,m2(1Ak,m)2]1nki=1nk(ek,i(1)ek,i(0))2subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}
+(E[Ak,m(1Ak,m)](5+pk)E[Ak,m2(1Ak,m)2])m=1mknk,mnk(τk,mτk)2𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚5subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+(E[A_{k,m}(1-A_{k,m})]-(5+p_{k})E[A^{2}_{k,m}(1-A_{k,m})^{2}])\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+pkE[Ak,m2(1Ak,m)2]m=1mknk,m2nk(τk,mτk)2.subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝑛2𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\sum_{m=1}^{m_{k}}\frac{n^{2}_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

We obtain in the appendix,

V~kclusterv~k=v~kclusterv~k+\scaleto𝒪5ptp(1).superscriptsubscript~𝑉𝑘clustersubscript~𝑣𝑘superscriptsubscript~𝑣𝑘clustersubscript~𝑣𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{\widetilde{V}_{k}^{\rm{cluster}}}{\tilde{v}_{k}}=\frac{\tilde{v}_{k}^{\rm cluster}}{\tilde{v}_{k}}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

Similar to the least squares case, the robust variance can underestimate the true variance, and the cluster variance is generally too large. Our proposed variance estimator is a convex combination of V~kclustersubscriptsuperscript~𝑉cluster𝑘\widetilde{V}^{\rm cluster}_{k} and and V~krobustsubscriptsuperscript~𝑉robust𝑘\widetilde{V}^{\rm robust}_{k}, with the weights selected to correct the bias of the cluster variance estimator as k𝑘k increases (see appendix for details).

V~kCCV=λ^kV~kcluster+(1λ^k)V~krobust.superscriptsubscript~𝑉𝑘CCVsubscript^𝜆𝑘subscriptsuperscript~𝑉cluster𝑘1subscript^𝜆𝑘subscriptsuperscript~𝑉robust𝑘\widetilde{V}_{k}^{\rm CCV}=\widehat{\lambda}_{k}\widetilde{V}^{\rm cluster}_{k}+(1-\widehat{\lambda}_{k}){\widetilde{V}}^{\rm robust}_{k}. (20)

where the estimated weight for the cluster variance is

λ^k=1qk(1Mkm=1mkQk,m Wk,m(1 Wk,m))21Mkm=1mkQk,m Wk,m2(1 Wk,m)2,subscript^𝜆𝑘1subscript𝑞𝑘superscript1subscript𝑀𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑄𝑘𝑚subscript W𝑘𝑚1subscript W𝑘𝑚21subscript𝑀𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑄𝑘𝑚superscriptsubscript W𝑘𝑚2superscript1subscript W𝑘𝑚2\widehat{\lambda}_{k}=1-q_{k}\,\frac{\left(\displaystyle\frac{1}{M_{k}}\sum_{m=1}^{m_{k}}Q_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m})\right)^{2}}{\displaystyle\frac{1}{M_{k}}\sum_{m=1}^{m_{k}}Q_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m}^{2}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m})^{2}},

where Qk,msubscript𝑄𝑘𝑚Q_{k,m} is an indicator that takes value one if cluster m𝑚m of population k𝑘k is sampled, and Mk=m=1mkQk,msubscript𝑀𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑄𝑘𝑚M_{k}=\sum_{m=1}^{m_{k}}Q_{k,m} is the total number of sampled clusters. The second factor in the second term approximately (that is, ignoring the variance of W¯k,msubscript¯𝑊𝑘𝑚\overline{W}_{k,m} conditional on Ak,m]A_{k,m}]) estimates the variance of Ak,m(1Ak,m)subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚A_{k,m}(1-A_{k,m}) divided by its second moment, so that

λ~1qkV(Ak,m(1Ak,m))E[(Ak,m(1Ak,m))2].~𝜆1subscript𝑞𝑘𝑉subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2\tilde{\lambda}\approx 1-q_{k}\frac{V(A_{k,m}(1-A_{k,m}))}{E[(A_{k,m}(1-A_{k,m}))^{2}]}.

If there is no variation in Wk,isubscript𝑊𝑘𝑖W_{k,i} within any of the clusters the fixed effect estimator is not defined, and neither is this variance estimator. In all other cases the variance estimator is well-defined.

We also consider a bootstrap standard error, based on the same resampling procedure described in Section 4.3.

6 Simulations

We next report simulation results that illustrate the performance of the proposed variance estimators relative to existing alternatives. To operate in an empirically relevant setting, we create an artificial population based on the Census data briefly described in the introduction, which contains information on log earnings, an indicator for college attendance, and an indicator for state of residence for 2,632,838 individuals.

For each individual in this population of 2,632,838 individuals, we define mk,isubscript𝑚𝑘𝑖m_{k,i} using state of residence (plus Washington, DC, and Puerto Rico), for a total of 52 clusters. We assign potential outcomes as yk,i(0)=Yk,iτ^k,mWk,isubscript𝑦𝑘𝑖0subscript𝑌𝑘𝑖subscript^𝜏𝑘𝑚subscript𝑊𝑘𝑖y_{k,i}(0)=Y_{k,i}-\widehat{\tau}_{k,m}W_{k,i} and yk,i(1)=Yk,i+τ^k,m(1Wk,i)subscript𝑦𝑘𝑖1subscript𝑌𝑘𝑖subscript^𝜏𝑘𝑚1subscript𝑊𝑘𝑖y_{k,i}(1)=Y_{k,i}+\widehat{\tau}_{k,m}(1-W_{k,i}), so treatment effects are constant within clusters. We then repeatedly create samples from this population. Creating a sample requires fixing pksubscript𝑝𝑘p_{k}, qksubscript𝑞𝑘q_{k}, and fixing the distribution of Ak,msubscript𝐴𝑘𝑚A_{k,m} and then drawing from the implied distribution for Rk,isubscript𝑅𝑘𝑖R_{k,i} and Wk,isubscript𝑊𝑘𝑖W_{k,i} to generate outcomes for all sampled units. In the baseline design, we set pk=qk=1subscript𝑝𝑘subscript𝑞𝑘1p_{k}=q_{k}=1, so we sample all mk=52subscript𝑚𝑘52m_{k}=52 clusters and all nk=2,632,838subscript𝑛𝑘2632838n_{k}=2{,}632{,}838 individuals in the population. For the assignment mechanism in the baseline design, we convert cluster means of the treatment variable into log-odds, ^k,m=ln( Wk,m/(1 Wk,m))subscript^𝑘𝑚subscript W𝑘𝑚1subscript W𝑘𝑚\widehat{\ell}_{k,m}=\ln(\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m}/(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.29167pt\hbox{\kern-0.20004pt$W$\kern-0.20004pt}}}_{k,m})). Let (μ^,σ^)subscript^𝜇subscript^𝜎(\widehat{\mu}_{\ell},\widehat{\sigma}_{\ell}) be the average and the sample standard deviation of ^k,msubscript^𝑘𝑚\widehat{\ell}_{k,m}. We then draw ln(Ak,m/(1Ak,m))subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚\ln(A_{k,m}/(1-A_{k,m})) for cluster m𝑚m from a normal distribution with expected value μ^subscript^𝜇\widehat{\mu}_{\ell} and standard deviation σ^subscript^𝜎\widehat{\sigma}_{\ell}. Given the cluster assignment probability Ak,msubscript𝐴𝑘𝑚A_{k,m}, we assign the treatment in cluster m𝑚m by drawing from a binomial distribution with parameter Ak,msubscript𝐴𝑘𝑚A_{k,m}.

We calculate the standard deviation of the least squares and fixed effect estimators, normalized by the square root of the sample size, Nk1/2s.d.superscriptsubscript𝑁𝑘12s.d.N_{k}^{1/2}\mbox{s.d.}, across 10,000 samples drawn according to the procedure outlined above. This is the benchmark against which we compare the various estimates of standard errors. For the least squares and the fixed effects estimators, respectively, we first calculate the (infeasible) asymptotic standard errors vk1/2superscriptsubscript𝑣𝑘12v_{k}^{1/2} and v~k1/2superscriptsubscript~𝑣𝑘12\widetilde{v}_{k}^{1/2} to benchmark the performance of the feasible variance estimators. Next, we calculate the averages across 10,000 simulations of the robust, cluster, CCV, and TCSB standard errors, where we use 100 bootstrap replications in each simulation. Table 6 reports the results. Table LABEL:table:coverage_rates reports coverage rates for 95 percent confidence intervals. In the design column of the two tables στksubscript𝜎subscript𝜏𝑘\sigma_{\tau_{k}} is the standard deviation of the cluster average treatment effect.

Table 2: Average standard errors across simulations
normalized standard error
Nk1/2s.d.superscriptsubscript𝑁𝑘12s.d.N_{k}^{1/2}\mbox{s.d.} vk1/2superscriptsubscript𝑣𝑘12v_{k}^{1/2} v~k1/2superscriptsubscript~𝑣𝑘12\widetilde{v}_{k}^{1/2} ​robust​ ​cluster​ ​CCV​ ​TSCB​
Baseline design: pk=1subscript𝑝𝑘1p_{k}=1, qk=1subscript𝑞𝑘1q_{k}=1, στk=.120subscript𝜎subscript𝜏𝑘.120\sigma_{\tau_{k}}=.120, σk=.057subscript𝜎𝑘.057\sigma_{k}=.057
OLS​ 5.91 5.90 1.90 44.86 6.32 5.80
FE​ 2.34 2.32 1.90 44.63 2.31 2.29
Second Design: pk=.1subscript𝑝𝑘.1p_{k}=.1, qk=1subscript𝑞𝑘1q_{k}=1, στk=.120subscript𝜎subscript𝜏𝑘.120\sigma_{\tau_{k}}=.120, σk=.057subscript𝜎𝑘.057\sigma_{k}=.057
OLS​ 2.61 2.59 1.90 14.28 3.78 2.60
FE​ 1.95 1.95 1.90 14.21 1.95 1.94
Third Design: pk=.1subscript𝑝𝑘.1p_{k}=.1, qk=1subscript𝑞𝑘1q_{k}=1, στk=.480subscript𝜎subscript𝜏𝑘.480\sigma_{\tau_{k}}=.480, σk=.206subscript𝜎𝑘.206\sigma_{k}=.206
OLS​ 14.50 14.17 1.98 56.46 13.70 14.33
FE​ 12.14 11.89 2.13 56.79 11.61 12.07
Fourth design: pk=.1subscript𝑝𝑘.1p_{k}=.1, qk=1subscript𝑞𝑘1q_{k}=1, στk=0subscript𝜎subscript𝜏𝑘0\sigma_{\tau_{k}}=0, σk=.206subscript𝜎𝑘.206\sigma_{k}=.206
OLS​ 9.39 9.39 1.90 8.20 9.19 9.37
FE​ 2.04 2.04 2.04 1.97 2.04 2.09
Fifth design: pk=.1subscript𝑝𝑘.1p_{k}=.1, qk=1subscript𝑞𝑘1q_{k}=1, στk=.480subscript𝜎subscript𝜏𝑘.480\sigma_{\tau_{k}}=.480, σk=0subscript𝜎𝑘0\sigma_{k}=0
OLS​ 1.95 1.97 1.97 56.42 4.53 2.04
FE​ 1.91 1.94 1.94 56.42 1.96 1.90
  • ​​Notes: Nk1/2superscriptsubscript𝑁𝑘12N_{k}^{1/2}s.d. is the standard deviation of the estimators over the simulations, multiplied by the square root of the sample size. vk1/2superscriptsubscript𝑣𝑘12v_{k}^{1/2} is the square root of the asymptotic variance in equation (3.1). v~k1/2superscriptsubscript~𝑣𝑘12\tilde{v}_{k}^{1/2} is the square root of the asymptotic variance of the fixed effect estimator in (16). The remaining four columns report average values of robust, cluster, CCV, and TSCB standard errors across simulations (multiplied by Nk1/2superscriptsubscript𝑁𝑘12N_{k}^{1/2}). pksubscript𝑝𝑘p_{k} and qksubscript𝑞𝑘q_{k} are the unit and cluster sampling probabilities, respectively. στksubscript𝜎subscript𝜏𝑘\sigma_{\tau_{k}} is the standard deviation of the cluster average treatment effect. σksubscript𝜎𝑘\sigma_{k} is the standard deviation across clusters of the treatment assignment probabilities.

For the baseline design, the normalized standard deviation of the least squares estimator is 5.91. This is well approximated by the asymptotic standard error, 5.90. The robust standard error is on average over the simulations 1.90, less than one-third of the normalized standard deviation of the estimator. The cluster standard error is far too large, on average 44.86, more than seven times the value of the normalized standard deviation. CCV improves considerably over robust and cluster. The average CCV standard error is 6.32, about 7 percent higher than the normalized standard deviation. The TSCB standard error is the most accurate, on average equal to 5.80. For the fixed effect estimator, the asymptotic standard error is again accurate. The robust standard error is about 16 percent too small, leading to a coverage rate for the nominal 95 percent confidence interval of 0.89 in Table LABEL:table:coverage_rates. The cluster standard error is too large by a factor of 20. CCV and TSCB standard errors closely approximate the normalized standard error.

When Should You Adjust Standard Errors for Clustering? Alberto Abadie, Susan Athey, Guido W. Imbens, and Jeffrey M. Wooldridge Current version: \Filemodtodayappendix

A.1 Setting and notation

We have a sequence of populations indexed by k𝑘k. The k𝑘k-th population has nksubscript𝑛𝑘n_{k} units, indexed by i=1,,nk𝑖1subscript𝑛𝑘i=1,\ldots,n_{k}. The population is partitioned into mksubscript𝑚𝑘m_{k} strata or clusters. Let mk,i{1,,mk}subscript𝑚𝑘𝑖1subscript𝑚𝑘m_{k,i}\in\{1,\ldots,m_{k}\} denote the stratum that unit i𝑖i of population k𝑘k belongs to. The number of units in cluster m𝑚m of population k𝑘k is nk,m1subscript𝑛𝑘𝑚1n_{k,m}\geq 1. For each unit, i𝑖i, there are two potential outcomes, yk,i(1)subscript𝑦𝑘𝑖1y_{k,i}(1) and yk,i(0)subscript𝑦𝑘𝑖0y_{k,i}(0), corresponding to treatment and no treatment. The parameter of interest is the population average treatment effect

τk=1nki=1nk(yk,i(1)yk,i(0)).subscript𝜏𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑦𝑘𝑖1subscript𝑦𝑘𝑖0\tau_{k}=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(y_{k,i}(1)-y_{k,i}(0)).

The population treatment effect by cluster is

τk,m=1nk,mi=1nk1{mk,i=m}(yk,i(1)yk,i(0)).subscript𝜏𝑘𝑚1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑦𝑘𝑖1subscript𝑦𝑘𝑖0\tau_{k,m}=\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(y_{k,i}(1)-y_{k,i}(0)).

Therefore,

τk=m=1mknk,mnkτk,m.subscript𝜏𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘subscript𝜏𝑘𝑚\tau_{k}=\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}\tau_{k,m}.

We will assume that potential outcomes, yk,i(1)subscript𝑦𝑘𝑖1y_{k,i}(1) and yk,i(0)subscript𝑦𝑘𝑖0y_{k,i}(0), are bounded in absolute value, uniformly for all (k,i)𝑘𝑖(k,i).

We next describe the two components of the stochastic nature of the sample. There is a stochastic binary treatment for each unit in each population, Wk,i{0,1}subscript𝑊𝑘𝑖01W_{k,i}\in\{0,1\}. The realized outcome for unit i𝑖i in population k𝑘k is Yk,i=yk,i(Wk,i)subscript𝑌𝑘𝑖subscript𝑦𝑘𝑖subscript𝑊𝑘𝑖Y_{k,i}=y_{k,i}(W_{k,i}). For a random sample of the population, we observe the triple (Yk,i,Wk,i,mk,i)subscript𝑌𝑘𝑖subscript𝑊𝑘𝑖subscript𝑚𝑘𝑖(Y_{k,i},W_{k,i},m_{k,i}). Inclusion in the sample is represented by the random variable Rk,isubscript𝑅𝑘𝑖R_{k,i}, which takes value one if unit i𝑖i belongs to the sample, and value zero if not.

The sampling process that determines the values of Rk,isubscript𝑅𝑘𝑖R_{k,i} is independent of the potential outcomes and the assignments. It consists of two stages. First, clusters are sampled with cluster sampling probability qk(0,1]subscript𝑞𝑘01q_{k}\in(0,1]. Second, units are sampled from the subpopulation consisting of all the sampled clusters, with unit sampling probability equal to pk(0,1]subscript𝑝𝑘01p_{k}\in(0,1]. Both qksubscript𝑞𝑘q_{k} and pksubscript𝑝𝑘p_{k} may be equal to one, or close to zero. If qk=1subscript𝑞𝑘1q_{k}=1, we sample all clusters. If pk=1subscript𝑝𝑘1p_{k}=1, we sample all units from the sampled clusters. If qk=pk=1subscript𝑞𝑘subscript𝑝𝑘1q_{k}=p_{k}=1, all units in the population are sampled.

The assignment process that determines the values of Wk,isubscript𝑊𝑘𝑖W_{k,i} also consists of two stages. In the first stage, for cluster m𝑚m in population k𝑘k, an assignment probability Ak,m[0,1]subscript𝐴𝑘𝑚01A_{k,m}\in[0,1] is drawn randomly from a distribution with mean μksubscript𝜇𝑘\mu_{k}, bounded away from zero and one uniformly in k𝑘k, and variance σk2subscriptsuperscript𝜎2𝑘\sigma^{2}_{k}, independently for each cluster. The variance σk2subscriptsuperscript𝜎2𝑘\sigma^{2}_{k} is key. If it is zero, we have random assignment across clusters. For positive values of σk2superscriptsubscript𝜎𝑘2\sigma_{k}^{2} we have correlated assignment within the clusters. Because Ak,m2Ak,msuperscriptsubscript𝐴𝑘𝑚2subscript𝐴𝑘𝑚A_{k,m}^{2}\leq A_{k,m}, it follows that σk2superscriptsubscript𝜎𝑘2\sigma_{k}^{2} is bounded above by μk(1μk)subscript𝜇𝑘1subscript𝜇𝑘\mu_{k}(1-\mu_{k}) and that the bound is attained when Ak,msubscript𝐴𝑘𝑚A_{k,m} can only take values zero or one (so all units within a cluster have the same values for the treatment). In the second stage, each unit in cluster m𝑚m is assigned to the treatment independently, with cluster-specific probability Ak,msubscript𝐴𝑘𝑚A_{k,m}.

A.2 Base case: Difference in means

Let

Nk,1=i=1nkRk,iWk,i and Nk,0=i=1nkRk,i(1Wk,i)subscript𝑁𝑘1superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖 and subscript𝑁𝑘0superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖N_{k,1}=\sum_{i=1}^{n_{k}}R_{k,i}W_{k,i}\quad\mbox{ and }\quad N_{k,0}=\sum_{i=1}^{n_{k}}R_{k,i}(1-W_{k,i})

be the number of treated and untreated units in the sample, respectively. The total sample size is Nk=Nk,1+Nk,0subscript𝑁𝑘subscript𝑁𝑘1subscript𝑁𝑘0N_{k}=N_{k,1}+N_{k,0}. We consider the simple difference of means between treated and non-treated, which is obtained as the coefficient on the treatment indicator in a regression of the outcome on a constant and the treatment,

τ^k=1Nk,11i=1nkRk,iWk,iYk,i1Nk,01i=1nkRk,i(1Wk,i)Yk,i.subscript^𝜏𝑘1subscript𝑁𝑘11superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑌𝑘𝑖1subscript𝑁𝑘01superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑌𝑘𝑖\widehat{\tau}_{k}=\frac{1}{N_{k,1}\vee 1}\sum_{i=1}^{n_{k}}R_{k,i}W_{k,i}Y_{k,i}-\frac{1}{N_{k,0}\vee 1}\sum_{i=1}^{n_{k}}R_{k,i}(1-W_{k,i})Y_{k,i}.

We make the following assumptions about the sampling process and the cluster sizes: (i) qkmksubscript𝑞𝑘subscript𝑚𝑘q_{k}m_{k}\rightarrow\infty, (ii) lim infkpkminmnk,m>0subscriptlimit-infimum𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0\liminf_{k\rightarrow\infty}p_{k}\min_{m}n_{k,m}>0, and (iii) lim supkmaxmnk,m/minmnk,m<subscriptlimit-supremum𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚\limsup_{k\rightarrow\infty}\max_{m}n_{k,m}/\min_{m}n_{k,m}<\infty. The first assumption implies that the expected number of sampled clusters goes to infinity as k𝑘k increases. The second assumption implies that the average number of observations sampled per cluster, conditional on the cluster being sampled, does not go to zero. The third assumption restricts the imbalance between the number of units across clusters. Notice that assumptions (i) and (ii) imply nkpkqksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘n_{k}p_{k}q_{k}\rightarrow\infty, so the sample size becomes larger in expectation as k𝑘k increases.

A.2.1 Large k𝑘k distribution

Let αk=(1/nk)i=1nkyk,i(0)subscript𝛼𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑦𝑘𝑖0\alpha_{k}=(1/n_{k})\sum_{i=1}^{n_{k}}y_{k,i}(0) and τk=(1/nk)i=1nk(yk,i(1)yi,k(0))subscript𝜏𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑦𝑘𝑖1subscript𝑦𝑖𝑘0\tau_{k}=(1/n_{k})\sum_{i=1}^{n_{k}}(y_{k,i}(1)-y_{i,k}(0)), uk,i(1)=yk,i(1)(αk+τk)subscript𝑢𝑘𝑖1subscript𝑦𝑘𝑖1subscript𝛼𝑘subscript𝜏𝑘u_{k,i}(1)=y_{k,i}(1)-(\alpha_{k}+\tau_{k}), and uk,i(0)=yk,i(0)αksubscript𝑢𝑘𝑖0subscript𝑦𝑘𝑖0subscript𝛼𝑘u_{k,i}(0)=y_{k,i}(0)-\alpha_{k}. Notice that,

i=1nkuk,i(1)=i=1nkuk,i(0)=0.superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑢𝑘𝑖1superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑢𝑘𝑖00\sum_{i=1}^{n_{k}}u_{k,i}(1)=\sum_{i=1}^{n_{k}}u_{k,i}(0)=0.

This implies

nkpkqk(τ^kτk)subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript^𝜏𝑘subscript𝜏𝑘\displaystyle\sqrt{n_{k}p_{k}q_{k}}(\widehat{\tau}_{k}-\tau_{k}) =bk,1b^k,1a^k,1bk,0b^k,0a^k,0,absentsubscript𝑏𝑘1subscript^𝑏𝑘1subscript^𝑎𝑘1subscript𝑏𝑘0subscript^𝑏𝑘0subscript^𝑎𝑘0\displaystyle=\frac{b_{k,1}}{\widehat{b}_{k,1}}\widehat{a}_{k,1}-\frac{b_{k,0}}{\widehat{b}_{k,0}}\widehat{a}_{k,0},

where

a^k,1subscript^𝑎𝑘1\displaystyle\widehat{a}_{k,1} =1nkpkqkμki=1nk(Rk,iWk,ipkqkμk)uk,i(1),absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑢𝑘𝑖1\displaystyle=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}\mu_{k}}\sum_{i=1}^{n_{k}}(R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k})u_{k,i}(1),
a^k,0subscript^𝑎𝑘0\displaystyle\widehat{a}_{k,0} =1nkpkqk(1μk)i=1nk(Rk,i(1Wk,i)pkqk(1μk))uk,i(0),absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘subscript𝑢𝑘𝑖0\displaystyle=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}(1-\mu_{k})}\sum_{i=1}^{n_{k}}(R_{k,i}(1-W_{k,i})-p_{k}q_{k}(1-\mu_{k}))u_{k,i}(0),

b^k,1=(Nk,11)/nksubscript^𝑏𝑘1subscript𝑁𝑘11subscript𝑛𝑘\widehat{b}_{k,1}=(N_{k,1}\vee 1)/n_{k}, b^k,0=(Nk,01)/nksubscript^𝑏𝑘0subscript𝑁𝑘01subscript𝑛𝑘\widehat{b}_{k,0}=(N_{k,0}\vee 1)/n_{k}, bk,1=pkqkμksubscript𝑏𝑘1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘b_{k,1}=p_{k}q_{k}\mu_{k} and bk,0=pkqk(1μk)subscript𝑏𝑘0subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘b_{k,0}=p_{k}q_{k}(1-\mu_{k}). We will first derive the large sample distribution of

a^ksubscript^𝑎𝑘\displaystyle\widehat{a}_{k} =a^k,1a^k,0absentsubscript^𝑎𝑘1subscript^𝑎𝑘0\displaystyle=\widehat{a}_{k,1}-\widehat{a}_{k,0}
=m=1mk(ξk,m,1ξk,m,0),absentsuperscriptsubscript𝑚1subscript𝑚𝑘subscript𝜉𝑘𝑚1subscript𝜉𝑘𝑚0\displaystyle=\sum_{m=1}^{m_{k}}\big{(}\xi_{k,m,1}-\xi_{k,m,0}\big{)},

where

ξk,m,1=1nkpkqkμki=1nk1{mk,i=m}(Rk,iWk,ipkqkμk)uk,i(1),subscript𝜉𝑘𝑚11subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑢𝑘𝑖1\xi_{k,m,1}=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}\mu_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}\big{)}u_{k,i}(1),

and

ξk,m,0=1nkpkqk(1μk)i=1nk1{mk,i=m}(Rk,i(1Wk,i)pkqk(1μk))uk,i(0).subscript𝜉𝑘𝑚01subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘subscript𝑢𝑘𝑖0\xi_{k,m,0}=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}(1-\mu_{k})}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}R_{k,i}(1-W_{k,i})-p_{k}q_{k}(1-\mu_{k})\big{)}u_{k,i}(0).

Notice that E[ξk,m,1]=E[ξk,m,0]=0𝐸delimited-[]subscript𝜉𝑘𝑚1𝐸delimited-[]subscript𝜉𝑘𝑚00E[\xi_{k,m,1}]=E[\xi_{k,m,0}]=0. Moreover, notice that the terms ξk,m,1ξk,m,0subscript𝜉𝑘𝑚1subscript𝜉𝑘𝑚0\xi_{k,m,1}-\xi_{k,m,0} are independent across clusters, m𝑚m. In addition,

E[ξk,m,12]=1nki=1nk𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚121subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘\displaystyle E[\xi_{k,m,1}^{2}]=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}} 1{mk,i=m}1pkqkμkμkuk,i2(1)1subscript𝑚𝑘𝑖𝑚1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝜇𝑘superscriptsubscript𝑢𝑘𝑖21\displaystyle 1\{m_{k,i}=m\}\frac{1-p_{k}q_{k}\mu_{k}}{\mu_{k}}u_{k,i}^{2}(1)
+2nki=1nk1j=i+1nk1{mk,i=mk,j=m}pk(σk2+μk2(1qk))μk2uk,i(1)uk,j(1).2subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘1superscriptsubscript𝑗𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚subscript𝑝𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘21subscript𝑞𝑘superscriptsubscript𝜇𝑘2subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑗1\displaystyle+\frac{2}{n_{k}}\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}\frac{p_{k}(\sigma_{k}^{2}+\mu_{k}^{2}(1-q_{k}))}{\mu_{k}^{2}}u_{k,i}(1)u_{k,j}(1).
E[ξk,m,02]=1nki=1nk𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚021subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘\displaystyle E[\xi_{k,m,0}^{2}]=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}} 1{mk,i=m}1pkqk(1μk)(1μk)uk,i2(0)1subscript𝑚𝑘𝑖𝑚1subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝑢𝑘𝑖20\displaystyle 1\{m_{k,i}=m\}\frac{1-p_{k}q_{k}(1-\mu_{k})}{(1-\mu_{k})}u_{k,i}^{2}(0)
+2nki=1nk1j=i+1nk1{mk,i=mk,j=m}pk(σk2+(1μk)2(1qk))(1μk)2uk,i(0)uk,j(0),2subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘1superscriptsubscript𝑗𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚subscript𝑝𝑘superscriptsubscript𝜎𝑘2superscript1subscript𝜇𝑘21subscript𝑞𝑘superscript1subscript𝜇𝑘2subscript𝑢𝑘𝑖0subscript𝑢𝑘𝑗0\displaystyle+\frac{2}{n_{k}}\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}\frac{p_{k}(\sigma_{k}^{2}+(1-\mu_{k})^{2}(1-q_{k}))}{(1-\mu_{k})^{2}}u_{k,i}(0)u_{k,j}(0),

and

E[\displaystyle E[ ξk,m,1ξk,m,0]=1nki=1nk1{mk,i=m}pkqkuk,i(1)uk,i(0)\displaystyle\xi_{k,m,1}\xi_{k,m,0}]=-\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}p_{k}q_{k}u_{k,i}(1)u_{k,i}(0)
+1nki=1nk1j=i+1nk1{mk,i=mk,j=m}pk(μk(1μk)(1qk)σk2)μk(1μk)(uk,i(0)uk,j(1)+uk,i(1)uk,j(0)).1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘1superscriptsubscript𝑗𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚subscript𝑝𝑘subscript𝜇𝑘1subscript𝜇𝑘1subscript𝑞𝑘superscriptsubscript𝜎𝑘2subscript𝜇𝑘1subscript𝜇𝑘subscript𝑢𝑘𝑖0subscript𝑢𝑘𝑗1subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑗0\displaystyle+\frac{1}{n_{k}}\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}\frac{p_{k}(\mu_{k}(1-\mu_{k})(1-q_{k})-\sigma_{k}^{2})}{\mu_{k}(1-\mu_{k})}\big{(}u_{k,i}(0)u_{k,j}(1)+u_{k,i}(1)u_{k,j}(0)\big{)}.

We obtain:

nkE[(ξk,m,1ξk,m,0)2]subscript𝑛𝑘𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚1subscript𝜉𝑘𝑚02\displaystyle n_{k}E[(\xi_{k,m,1}-\xi_{k,m,0})^{2}]
=1μki=1nk1{mk,i=m}uk,i2(1)+11μki=1nk1{mk,i=m}uk,i2(0)absent1subscript𝜇𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscriptsuperscript𝑢2𝑘𝑖111subscript𝜇𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscriptsuperscript𝑢2𝑘𝑖0\displaystyle=\frac{1}{\mu_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}u^{2}_{k,i}(1)+\frac{1}{1-\mu_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}u^{2}_{k,i}(0)
+2pki=1nk1j=i+1nk1{mk,i=mk,j=m}(uk,i(1)uk,j(1)+uk,i(0)uk,j(0)uk,i(0)uk,j(1)uk,i(1)uk,j(0))2subscript𝑝𝑘superscriptsubscript𝑖1subscript𝑛𝑘1superscriptsubscript𝑗𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑗1subscript𝑢𝑘𝑖0subscript𝑢𝑘𝑗0subscript𝑢𝑘𝑖0subscript𝑢𝑘𝑗1subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑗0\displaystyle+2p_{k}\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}\Big{(}u_{k,i}(1)u_{k,j}(1)+u_{k,i}(0)u_{k,j}(0)-u_{k,i}(0)u_{k,j}(1)-u_{k,i}(1)u_{k,j}(0)\Big{)}
pkqk(i=1nk1{mk,i=m}(uk,i2(1)+uk,i2(0)2uk,i(1)uk,i(0))\displaystyle-p_{k}q_{k}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u^{2}_{k,i}(1)+u^{2}_{k,i}(0)-2u_{k,i}(1)u_{k,i}(0)\big{)}
+2i=1nk1j=i+1nk1{mk,i=mk,j=m}(uk,i(1)uk,j(1)+uk,i(0)uk,j(0)uk,i(0)uk,j(1)uk,i(1)uk,j(0)))\displaystyle\hskip 28.45274pt+2\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}\big{(}u_{k,i}(1)u_{k,j}(1)+u_{k,i}(0)u_{k,j}(0)-u_{k,i}(0)u_{k,j}(1)-u_{k,i}(1)u_{k,j}(0)\big{)}\Bigg{)}
+2pkσk2(i=1nk1j=i+1nk1{mk,i=mk,j=m}(uk,i(1)uk,j(1)μk2+uk,i(0)uk,j(0)(1μk)2+uk,i(0)uk,j(1)μk(1μk)+uk,i(1)uk,j(0)μk(1μk)).\displaystyle+2p_{k}\sigma^{2}_{k}\Bigg{(}\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}\bigg{(}\frac{u_{k,i}(1)u_{k,j}(1)}{\mu^{2}_{k}}+\frac{u_{k,i}(0)u_{k,j}(0)}{(1-\mu_{k})^{2}}+\frac{u_{k,i}(0)u_{k,j}(1)}{\mu_{k}(1-\mu_{k})}+\frac{u_{k,i}(1)u_{k,j}(0)}{\mu_{k}(1-\mu_{k})}\Bigg{)}.

Therefore,

nksubscript𝑛𝑘\displaystyle n_{k} E[(ξk,m,1ξk,m,0)2]𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚1subscript𝜉𝑘𝑚02\displaystyle E[(\xi_{k,m,1}-\xi_{k,m,0})^{2}]
=1μki=1nk1{mk,i=m}uk,i2(1)+11μki=1nk1{mk,i=m}uk,i2(0)absent1subscript𝜇𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscriptsuperscript𝑢2𝑘𝑖111subscript𝜇𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscriptsuperscript𝑢2𝑘𝑖0\displaystyle=\frac{1}{\mu_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}u^{2}_{k,i}(1)+\frac{1}{1-\mu_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}u^{2}_{k,i}(0)
+pk[(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2i=1nk1{mk,i=m}(uk,i(1)uk,i(0))2]subscript𝑝𝑘delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle+p_{k}\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}-\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}\Bigg{]}
pkqk(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2subscript𝑝𝑘subscript𝑞𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle-p_{k}q_{k}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}
+pkσk2[(i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk)2].subscript𝑝𝑘subscriptsuperscript𝜎2𝑘delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle+p_{k}\sigma^{2}_{k}\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}-\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}\Bigg{]}.

Let vk=m=1mkE[(ξk,m,1ξk,m,0)2]subscript𝑣𝑘superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚1subscript𝜉𝑘𝑚02v_{k}=\sum_{m=1}^{m_{k}}E[(\xi_{k,m,1}-\xi_{k,m,0})^{2}], then

nkvksubscript𝑛𝑘subscript𝑣𝑘\displaystyle n_{k}v_{k} =i=1nk(uk,i2(1)μk+uk,i2(0)1μk)absentsuperscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘\displaystyle=\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}
+pkm=1mk[(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2i=1nk1{mk,i=m}(uk,i(1)uk,i(0))2]subscript𝑝𝑘superscriptsubscript𝑚1subscript𝑚𝑘delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle+p_{k}\sum_{m=1}^{m_{k}}\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}-\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}\Bigg{]}
pkqkm=1mk(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle-p_{k}q_{k}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}
+pkσk2m=1mk[(i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk)2].subscript𝑝𝑘subscriptsuperscript𝜎2𝑘superscriptsubscript𝑚1subscript𝑚𝑘delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle+p_{k}\sigma^{2}_{k}\sum_{m=1}^{m_{k}}\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}-\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}\Bigg{]}.

Alternatively, we can write this expression as

nkvksubscript𝑛𝑘subscript𝑣𝑘\displaystyle n_{k}v_{k} =i=1nk(uk,i2(1)μk+uk,i2(0)1μk)absentsuperscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘\displaystyle=\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}
pki=1nk(uk,i(1)uk,i(0))2pkσk2i=1nk(uk,i(1)μk+uk,i(0)1μk)2subscript𝑝𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02subscript𝑝𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle-p_{k}\sum_{i=1}^{n_{k}}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}-p_{k}\sigma_{k}^{2}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}
+pk(1qk)m=1mk(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2subscript𝑝𝑘1subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle+p_{k}(1-q_{k})\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}
+pkσk2m=1mk(i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2.subscript𝑝𝑘subscriptsuperscript𝜎2𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle+p_{k}\sigma^{2}_{k}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}.

The sum of the first three terms is minimized for pk=1subscript𝑝𝑘1p_{k}=1 and σk2=μk(1μk)superscriptsubscript𝜎𝑘2subscript𝜇𝑘1subscript𝜇𝑘\sigma_{k}^{2}=\mu_{k}(1-\mu_{k}), in which case this sum is equal to zero. Therefore,

vksubscript𝑣𝑘\displaystyle v_{k} (pkminmnk,m)(1qk)m=1mknk,mnk(1nk,mi=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2absentsubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚1subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscript1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle\geq(p_{k}\min_{m}n_{k,m})(1-q_{k})\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}\Bigg{(}\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}
+(pkminmnk,m)σk2m=1mknk,mnk(1nk,mi=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2.subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚subscriptsuperscript𝜎2𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscript1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle+(p_{k}\min_{m}n_{k,m})\,\sigma^{2}_{k}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}\Bigg{(}\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}. (A.1)

We will assume that lim infk((1qk)σk2)>0subscriptlimit-infimum𝑘1subscript𝑞𝑘superscriptsubscript𝜎𝑘20\liminf_{k\rightarrow\infty}((1-q_{k})\vee\sigma_{k}^{2})>0, so either sampling or assignment or both are correlated within cluster. (We study the case qk=1subscript𝑞𝑘1q_{k}=1 and σk2=0superscriptsubscript𝜎𝑘20\sigma_{k}^{2}=0 separately below.) In addition, assume (i) lim infk(1qk)>0subscriptlimit-infimum𝑘1subscript𝑞𝑘0\liminf_{k\rightarrow\infty}(1-q_{k})>0 and

lim infkm=1mknk,mnk(1nk,mi=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2>0,subscriptlimit-infimum𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscript1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖020\liminf\limits_{k\rightarrow\infty}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}\Bigg{(}\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}>0, (A.2)

or (ii) lim infkσk2>0subscriptlimit-infimum𝑘superscriptsubscript𝜎𝑘20\liminf_{k\rightarrow\infty}\sigma_{k}^{2}>0 and

lim infkm=1mknk,mnk(1nk,mi=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2>0.subscriptlimit-infimum𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscript1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘20\liminf\limits_{k\rightarrow\infty}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}\Bigg{(}\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}>0. (A.3)

Equation (A.2) would be violated if, as k𝑘k increases, there is no variation in average treatment effects across clusters. Equation (A.3) would be violated if as k𝑘k increases there is no variation in average potential outcomes across clusters. If equations (A.2) and (A.3) hold, vksubscript𝑣𝑘v_{k} is bounded below by a term of order at least pkminmnk,msubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚p_{k}\min_{m}n_{k,m}. Recall our assumption, lim infkpkminmnk,m>0subscriptlimit-infimum𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0\liminf_{k\rightarrow\infty}p_{k}\min_{m}n_{k,m}>0, so the average number of observations sampled per cluster, conditional on the cluster being sampled, does not go to zero. Then,

lim infkvk>0.subscriptlimit-infimum𝑘subscript𝑣𝑘0\liminf\limits_{k\rightarrow\infty}v_{k}>0.

To obtain a CLT, we will check Lyapunov’s condition,

limkm=1mk1vk1+δ/2E[|ξk,m,1ξk,m,0|2+δ]=0,subscript𝑘superscriptsubscript𝑚1subscript𝑚𝑘1superscriptsubscript𝑣𝑘1𝛿2𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚1subscript𝜉𝑘𝑚02𝛿0\lim_{k\rightarrow\infty}\sum_{m=1}^{m_{k}}\frac{1}{v_{k}^{1+\delta/2}}E[|\xi_{k,m,1}-\xi_{k,m,0}|^{2+\delta}]=0,

for some δ>0𝛿0\delta>0. Because potential outcomes are uniformly bounded and μksubscript𝜇𝑘\mu_{k} is uniformly bounded away from zero, we obtain

|ξk,m,1|2+δsuperscriptsubscript𝜉𝑘𝑚12𝛿\displaystyle|\xi_{k,m,1}|^{2+\delta} cnk,m2+δ(nkpkqk)1+δ/2|1nk,mi=1nk1{mk,i=m}|Rk,iWk,ipkqkμk||2+δ,absent𝑐superscriptsubscript𝑛𝑘𝑚2𝛿superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1𝛿2superscript1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘2𝛿\displaystyle\leq c\frac{n_{k,m}^{2+\delta}}{(n_{k}p_{k}q_{k})^{1+\delta/2}}\left|\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}|R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}|\right|^{2+\delta},

where c𝑐c is some generic positive constant, whose value may change across equations. Consider δ=1𝛿1\delta=1, and let

S𝑆\displaystyle S =k,m,13E[|1nk,mi=1nk1{mk,i=m}|Rk,iWk,ipkqkμk||3]{}_{k,m,1}^{3}=E\left[\left|\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}|R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}|\right|^{3}\right]
1nk,m3nk,mE[|Rk,iWk,ipkqkμk|3]absent1superscriptsubscript𝑛𝑘𝑚3subscript𝑛𝑘𝑚𝐸delimited-[]superscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘3\displaystyle\leq\frac{1}{n_{k,m}^{3}}n_{k,m}E[|R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}|^{3}]
+3nk,m3nk,m(nk,m1)E[|Rk,iWk,ipkqkμk|2|Rk,jWk,jpkqkμk||mk,i=mk,j=m]3superscriptsubscript𝑛𝑘𝑚3subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1𝐸delimited-[]conditionalsuperscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘2subscript𝑅𝑘𝑗subscript𝑊𝑘𝑗subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚\displaystyle+\frac{3}{n_{k,m}^{3}}n_{k,m}(n_{k,m}-1)E\big{[}|R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}|^{2}|R_{k,j}W_{k,j}-p_{k}q_{k}\mu_{k}|\big{|}m_{k,i}=m_{k,j}=m\big{]}
+6nk,m3(nk,m3)E[|Rk,iWk,ipkqkμk||Rk,jWk,jpkqkμk||Rk,tWk,tpkqkμk||mk,i=mk,j=mk,t=m],6superscriptsubscript𝑛𝑘𝑚3binomialsubscript𝑛𝑘𝑚3𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑅𝑘𝑗subscript𝑊𝑘𝑗subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑅𝑘𝑡subscript𝑊𝑘𝑡subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗subscript𝑚𝑘𝑡𝑚\displaystyle+\frac{6}{n_{k,m}^{3}}\dbinom{n_{k,m}}{3}E\big{[}|R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}||R_{k,j}W_{k,j}-p_{k}q_{k}\mu_{k}||R_{k,t}W_{k,t}-p_{k}q_{k}\mu_{k}|\big{|}m_{k,i}=m_{k,j}=m_{k,t}=m\big{]},

for ijt𝑖𝑗𝑡i\neq j\neq t. (The second and third terms on the left-hand side of last equation only appear when nk,m2subscript𝑛𝑘𝑚2n_{k,m}\geq 2 and nk,m3subscript𝑛𝑘𝑚3n_{k,m}\geq 3, respectively) As a result,

Sk,m,13superscriptsubscript𝑆𝑘𝑚13\displaystyle S_{k,m,1}^{3} c(pkqknk,m2+pk2qknk,m+pk3qk)absent𝑐subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑛𝑘𝑚2superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑛𝑘𝑚superscriptsubscript𝑝𝑘3subscript𝑞𝑘\displaystyle\leq c\left(\frac{p_{k}q_{k}}{n_{k,m}^{2}}+\frac{p_{k}^{2}q_{k}}{n_{k,m}}+p_{k}^{3}q_{k}\right)
cpk3qk(1pk2minmnk,m2+1pkminmnk,m+1).absent𝑐superscriptsubscript𝑝𝑘3subscript𝑞𝑘1superscriptsubscript𝑝𝑘2subscript𝑚superscriptsubscript𝑛𝑘𝑚21subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚1\displaystyle\leq c\,p_{k}^{3}q_{k}\left(\frac{1}{p_{k}^{2}\min_{m}n_{k,m}^{2}}+\frac{1}{p_{k}\min_{m}n_{k,m}}+1\right).

Because lim infkpkminmnk,m>0subscriptlimit-infimum𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0\liminf_{k\rightarrow\infty}p_{k}\min_{m}n_{k,m}>0, for large enough k𝑘k we obtain,

E[|ξk,m,1|3]𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚13\displaystyle E[|\xi_{k,m,1}|^{3}] cpk3qknk,m3(nkpkqk)3/2,absent𝑐superscriptsubscript𝑝𝑘3subscript𝑞𝑘superscriptsubscript𝑛𝑘𝑚3superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘32\displaystyle\leq c\frac{p_{k}^{3}q_{k}n_{k,m}^{3}}{(n_{k}p_{k}q_{k})^{3/2}},

and the same bound applies for E[|ξk,m,0|3]𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚03E[|\xi_{k,m,0}|^{3}]. Notice that

m=1mkE[|ξk,m,1ξk,m,0|3]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚1subscript𝜉𝑘𝑚03\displaystyle\sum_{m=1}^{m_{k}}E[|\xi_{k,m,1}-\xi_{k,m,0}|^{3}] m=1mkE[(|ξk,m,1|+|ξk,m,0|)3]absentsuperscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚1subscript𝜉𝑘𝑚03\displaystyle\leq\sum_{m=1}^{m_{k}}E[(|\xi_{k,m,1}|+|\xi_{k,m,0}|)^{3}]
=m=1mkE[|ξk,m,1|3]+m=1mkE[|ξk,m,0|3]absentsuperscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚13superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚03\displaystyle=\sum_{m=1}^{m_{k}}E[|\xi_{k,m,1}|^{3}]+\sum_{m=1}^{m_{k}}E[|\xi_{k,m,0}|^{3}]
+3m=1mkE[|ξk,m,1|2|ξk,m,0|]+3m=1mkE[|ξk,m,1||ξk,m,0|2].3superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜉𝑘𝑚12subscript𝜉𝑘𝑚03superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscript𝜉𝑘𝑚1superscriptsubscript𝜉𝑘𝑚02\displaystyle+3\sum_{m=1}^{m_{k}}E[|\xi_{k,m,1}|^{2}|\xi_{k,m,0}|]+3\sum_{m=1}^{m_{k}}E[|\xi_{k,m,1}||\xi_{k,m,0}|^{2}].

Now, Hölder’s inequality implies that

pk3qkm=1mknk,m3vk3/2(nkpkqk)3/20,superscriptsubscript𝑝𝑘3subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚3superscriptsubscript𝑣𝑘32superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘320\frac{p_{k}^{3}q_{k}\sum_{m=1}^{m_{k}}n_{k,m}^{3}}{v_{k}^{3/2}(n_{k}p_{k}q_{k})^{3/2}}\longrightarrow 0, (A.4)

is sufficient for the Lyapunov condition to hold. Because maxmnk,m/minmnk,msubscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚\max_{m}n_{k,m}/\min_{m}n_{k,m} is bounded asymptotically, we obtain,

lim supkpk3qkm=1mknk,m3vk3/2(nkpkqk)3/2subscriptlimit-supremum𝑘superscriptsubscript𝑝𝑘3subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚3superscriptsubscript𝑣𝑘32superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘32\displaystyle\limsup\limits_{k\rightarrow\infty}\frac{p_{k}^{3}q_{k}\sum_{m=1}^{m_{k}}n_{k,m}^{3}}{v_{k}^{3/2}(n_{k}p_{k}q_{k})^{3/2}} lim supkcpk3qkmkmaxmnk,m3(pk2qkmkminmnk,m2)3/2absentsubscriptlimit-supremum𝑘𝑐superscriptsubscript𝑝𝑘3subscript𝑞𝑘subscript𝑚𝑘subscript𝑚superscriptsubscript𝑛𝑘𝑚3superscriptsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑚𝑘subscript𝑚superscriptsubscript𝑛𝑘𝑚232\displaystyle\leq\limsup\limits_{k\rightarrow\infty}c\,\frac{p_{k}^{3}q_{k}m_{k}\max_{m}n_{k,m}^{3}}{(p_{k}^{2}q_{k}m_{k}\min_{m}n_{k,m}^{2})^{3/2}}
lim supk(maxmnk,mminmnk,m)3cqkmk=0,absentsubscriptlimit-supremum𝑘superscriptsubscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚3𝑐subscript𝑞𝑘subscript𝑚𝑘0\displaystyle\leq\limsup\limits_{k\rightarrow\infty}\left(\frac{\max_{m}n_{k,m}}{\min_{m}n_{k,m}}\right)^{3}\frac{c}{\sqrt{q_{k}m_{k}}}=0,

and so the Lyapunov condition holds. As a result, we obtain

a^k/vkdN(0,1).superscript𝑑subscript^𝑎𝑘subscript𝑣𝑘𝑁01\widehat{a}_{k}/\sqrt{v_{k}}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).

We will next prove that both a^k,1/vksubscript^𝑎𝑘1subscript𝑣𝑘\widehat{a}_{k,1}/\sqrt{v_{k}} and a^k,0/vksubscript^𝑎𝑘0subscript𝑣𝑘\widehat{a}_{k,0}/\sqrt{v_{k}} are 𝒪p(1)subscript𝒪𝑝1\mathcal{O}_{p}(1).

E[a^k,12]𝐸delimited-[]superscriptsubscript^𝑎𝑘12\displaystyle E[\widehat{a}_{k,1}^{2}] =1nkpkqk1μk2m=1mkE[(i=1nk1{mk,i=m}(Rk,iWk,ipkqkμk)uk,i(1))2]absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1superscriptsubscript𝜇𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑢𝑘𝑖12\displaystyle=\frac{1}{n_{k}p_{k}q_{k}}\frac{1}{\mu_{k}^{2}}\sum_{m=1}^{m_{k}}E\left[\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k})u_{k,i}(1)\right)^{2}\right]
c1nkpkqkm=1mk(nk,mpkqk+nk,m(nk,m1)pk2qk)absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘\displaystyle\leq c\,\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\big{(}n_{k,m}p_{k}q_{k}+n_{k,m}(n_{k,m}-1)p_{k}^{2}q_{k}\big{)}
=c(1+m=1mknk,m(nk,m1)pknk).absent𝑐1superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1subscript𝑝𝑘subscript𝑛𝑘\displaystyle=c\,\Big{(}1+\sum_{m=1}^{m_{k}}\frac{n_{k,m}(n_{k,m}-1)p_{k}}{n_{k}}\Big{)}.

Therefore,

E[(a^k,1/vk)2]𝐸delimited-[]superscriptsubscript^𝑎𝑘1subscript𝑣𝑘2\displaystyle E[(\widehat{a}_{k,1}/\sqrt{v_{k}})^{2}] c(1pkminmnk,m+m=1mk(maxmnk,m)(nk,m1)pknkpkminmnk,m).absent𝑐1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1subscript𝑝𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚\displaystyle\leq c\left(\frac{1}{p_{k}\min_{m}n_{k,m}}+\sum_{m=1}^{m_{k}}\frac{(\max_{m}n_{k,m})(n_{k,m}-1)p_{k}}{n_{k}p_{k}\min_{m}n_{k,m}}\right).

Because lim supmaxmnk,m/minmnk,m<limit-supremumsubscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚\limsup\max_{m}n_{k,m}/\min_{m}n_{k,m}<\infty, we obtain lim supkE[(a^k,1/vk)2]<subscriptlimit-supremum𝑘𝐸delimited-[]superscriptsubscript^𝑎𝑘1subscript𝑣𝑘2\limsup_{k\rightarrow\infty}E[(\widehat{a}_{k,1}/v_{k})^{2}]<\infty. As a result, a^k,1/vksubscript^𝑎𝑘1subscript𝑣𝑘\widehat{a}_{k,1}/\sqrt{v_{k}} is 𝒪p(1)subscript𝒪𝑝1\mathcal{O}_{p}(1).

Let b~k,1=Nk,1/nksubscript~𝑏𝑘1subscript𝑁𝑘1subscript𝑛𝑘\widetilde{b}_{k,1}=N_{k,1}/n_{k}. Consider k𝑘k large enough, so pkminmnk,msubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚p_{k}\min_{m}n_{k,m} is bounded away from zero, making b~k,1/bk,1subscript~𝑏𝑘1subscript𝑏𝑘1\widetilde{b}_{k,1}/b_{k,1} well-defined. Notice that E[b~k,1/bk,1]=1𝐸delimited-[]subscript~𝑏𝑘1subscript𝑏𝑘11E[\widetilde{b}_{k,1}/b_{k,1}]=1 and

var(b~k,1/bk,1)varsubscript~𝑏𝑘1subscript𝑏𝑘1\displaystyle\mbox{var}(\widetilde{b}_{k,1}/b_{k,1}) =1(nkpkqkμk)2m=1mkE[(i=1nk1{mk,i=m}(Rk,iWk,inkpkqkμk))2]absent1superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘2\displaystyle=\frac{1}{(n_{k}p_{k}q_{k}\mu_{k})^{2}}\sum_{m=1}^{m_{k}}E\left[\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(R_{k,i}W_{k,i}-n_{k}p_{k}q_{k}\mu_{k})\right)^{2}\right]
=nkpkqkμk(1pkqkμk)(nkpkqkμk)2+m=1mknk,m(nk,m1)pk2qk(σk2+(1qk)μk2)(nkpkqkμk)2absentsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘21subscript𝑞𝑘superscriptsubscript𝜇𝑘2superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘2\displaystyle=\frac{n_{k}p_{k}q_{k}\mu_{k}(1-p_{k}q_{k}\mu_{k})}{(n_{k}p_{k}q_{k}\mu_{k})^{2}}+\sum_{m=1}^{m_{k}}\frac{n_{k,m}(n_{k,m}-1)p_{k}^{2}q_{k}(\sigma_{k}^{2}+(1-q_{k})\mu_{k}^{2})}{(n_{k}p_{k}q_{k}\mu_{k})^{2}}
1pkqkμknkpkqkμk+cnk(maxmnk,m1)pk2qk(nkpkqk)2absent1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘𝑐subscript𝑛𝑘subscript𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘2\displaystyle\leq\frac{1-p_{k}q_{k}\mu_{k}}{n_{k}p_{k}q_{k}\mu_{k}}+c\,\frac{n_{k}(\max_{m}n_{k,m}-1)p_{k}^{2}q_{k}}{(n_{k}p_{k}q_{k})^{2}}
1pkqkμknkpkqkμk+c(maxmnk,m1)minnk,m1qkmk0.absent1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘𝑐subscript𝑚subscript𝑛𝑘𝑚1subscript𝑛𝑘𝑚1subscript𝑞𝑘subscript𝑚𝑘0\displaystyle\leq\frac{1-p_{k}q_{k}\mu_{k}}{n_{k}p_{k}q_{k}\mu_{k}}+c\,\frac{(\max_{m}n_{k,m}-1)}{\min n_{k,m}}\frac{1}{q_{k}m_{k}}\longrightarrow 0.

This implies b~k,1/bk,1p1superscript𝑝subscript~𝑏𝑘1subscript𝑏𝑘11\widetilde{b}_{k,1}/b_{k,1}\stackrel{{\scriptstyle p}}{{\rightarrow}}1. Analogous calculations yield b~0,k/b0,kp1superscript𝑝subscript~𝑏0𝑘subscript𝑏0𝑘1\widetilde{b}_{0,k}/b_{0,k}\stackrel{{\scriptstyle p}}{{\rightarrow}}1. For large enough k𝑘k, b~k,1/bk,1=0subscript~𝑏𝑘1subscript𝑏𝑘10\widetilde{b}_{k,1}/b_{k,1}=0 if and only if Nk,1=0subscript𝑁𝑘10N_{k,1}=0, which implies Pr(Nk,1=0)0Prsubscript𝑁𝑘100\Pr(N_{k,1}=0)\rightarrow 0. It follows that, for large enough k𝑘k,

Pr(|b~k,1/bk,1b^k,1/bk,1|=0)=Pr(Nk,1>0)1Prsubscript~𝑏𝑘1subscript𝑏𝑘1subscript^𝑏𝑘1subscript𝑏𝑘10Prsubscript𝑁𝑘101\Pr(|\widetilde{b}_{k,1}/b_{k,1}-\widehat{b}_{k,1}/b_{k,1}|=0)=\Pr(N_{k,1}>0)\longrightarrow 1

and b^k,1/bk,1p1superscript𝑝subscript^𝑏𝑘1subscript𝑏𝑘11\widehat{b}_{k,1}/b_{k,1}\stackrel{{\scriptstyle p}}{{\rightarrow}}1. Using analogous calculations, we obtain b^k,0/bk,0p1superscript𝑝subscript^𝑏𝑘0subscript𝑏𝑘01\widehat{b}_{k,0}/b_{k,0}\stackrel{{\scriptstyle p}}{{\rightarrow}}1. As a result,

nkpkqk(τ^kτk)/vk1/2subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript^𝜏𝑘subscript𝜏𝑘superscriptsubscript𝑣𝑘12\displaystyle\sqrt{n_{k}p_{k}q_{k}}(\widehat{\tau}_{k}-\tau_{k})/v_{k}^{1/2} =bk,1b^k,1a^k,1vk1/2bk,0b^k,0a^k,0vk1/2absentsubscript𝑏𝑘1subscript^𝑏𝑘1subscript^𝑎𝑘1superscriptsubscript𝑣𝑘12subscript𝑏𝑘0subscript^𝑏𝑘0subscript^𝑎𝑘0superscriptsubscript𝑣𝑘12\displaystyle=\frac{b_{k,1}}{\widehat{b}_{k,1}}\frac{\widehat{a}_{k,1}}{v_{k}^{1/2}}-\frac{b_{k,0}}{\widehat{b}_{k,0}}\frac{\widehat{a}_{k,0}}{v_{k}^{1/2}}
=a^kvk1/2+(bk,1b^k,11)a^k,1vk1/2(bk,0b^k,01)a^k,0vk1/2absentsubscript^𝑎𝑘superscriptsubscript𝑣𝑘12subscript𝑏𝑘1subscript^𝑏𝑘11subscript^𝑎𝑘1superscriptsubscript𝑣𝑘12subscript𝑏𝑘0subscript^𝑏𝑘01subscript^𝑎𝑘0superscriptsubscript𝑣𝑘12\displaystyle=\frac{\widehat{a}_{k}}{v_{k}^{1/2}}+\left(\frac{b_{k,1}}{\widehat{b}_{k,1}}-1\right)\frac{\widehat{a}_{k,1}}{v_{k}^{1/2}}-\left(\frac{b_{k,0}}{\widehat{b}_{k,0}}-1\right)\frac{\widehat{a}_{k,0}}{v_{k}^{1/2}}
=a^k/vk+\scaleto𝒪5ptp(1).absentsubscript^𝑎𝑘subscript𝑣𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\displaystyle=\widehat{a}_{k}/\sqrt{v_{k}}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

Therefore,

nkpkqk(τ^kτk)/vk1/2dN(0,1).superscript𝑑subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript^𝜏𝑘subscript𝜏𝑘superscriptsubscript𝑣𝑘12𝑁01\sqrt{n_{k}p_{k}q_{k}}(\widehat{\tau}_{k}-\tau_{k})/v_{k}^{1/2}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).

Using b~1,k/b1,kp1superscript𝑝subscript~𝑏1𝑘subscript𝑏1𝑘1\widetilde{b}_{1,k}/b_{1,k}\stackrel{{\scriptstyle p}}{{\rightarrow}}1 and b~0,k/b0,kp1superscript𝑝subscript~𝑏0𝑘subscript𝑏0𝑘1\widetilde{b}_{0,k}/b_{0,k}\stackrel{{\scriptstyle p}}{{\rightarrow}}1, it is easy to show Nk/(nkpkqk)p1superscript𝑝subscript𝑁𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1N_{k}/(n_{k}p_{k}q_{k})\stackrel{{\scriptstyle p}}{{\rightarrow}}1, which implies

Nk(τ^kτk)/vk1/2dN(0,1).superscript𝑑subscript𝑁𝑘subscript^𝜏𝑘subscript𝜏𝑘superscriptsubscript𝑣𝑘12𝑁01\sqrt{N_{k}}(\widehat{\tau}_{k}-\tau_{k})/v_{k}^{1/2}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).

We will next consider the case of qk=1subscript𝑞𝑘1q_{k}=1 and σk2=0superscriptsubscript𝜎𝑘20\sigma_{k}^{2}=0, where no clustering is required. Consider

ϑk,i,1=1nkpkμk(Rk,iWk,ipkμk)uk,i(1)subscriptitalic-ϑ𝑘𝑖11subscript𝑛𝑘subscript𝑝𝑘subscript𝜇𝑘subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝜇𝑘subscript𝑢𝑘𝑖1\vartheta_{k,i,1}=\frac{1}{\sqrt{n_{k}p_{k}}\mu_{k}}\big{(}R_{k,i}W_{k,i}-p_{k}\mu_{k}\big{)}u_{k,i}(1)

and

ϑk,i,0=1nkpk(1μk)(Rk,i(1Wk,i)pk(1μk))uk,i(0).subscriptitalic-ϑ𝑘𝑖01subscript𝑛𝑘subscript𝑝𝑘1subscript𝜇𝑘subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑝𝑘1subscript𝜇𝑘subscript𝑢𝑘𝑖0\vartheta_{k,i,0}=\frac{1}{\sqrt{n_{k}p_{k}}(1-\mu_{k})}\big{(}R_{k,i}(1-W_{k,i})-p_{k}(1-\mu_{k})\big{)}u_{k,i}(0).

Redefine now vk=i=1nkE[(ϑk,i,1ϑk,i,0)2]subscript𝑣𝑘superscriptsubscript𝑖1subscript𝑛𝑘𝐸delimited-[]superscriptsubscriptitalic-ϑ𝑘𝑖1subscriptitalic-ϑ𝑘𝑖02v_{k}=\sum_{i=1}^{n_{k}}E\big{[}(\vartheta_{k,i,1}-\vartheta_{k,i,0})^{2}\big{]}. Then,

vksubscript𝑣𝑘\displaystyle v_{k} =1nki=1nk(uk,i2(1)μk+uk,i2(0)1μk)pk1nki=1nk(uk,i(1)uk,i(0))2.absent1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘subscript𝑝𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}-p_{k}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}.

Notice that vksubscript𝑣𝑘v_{k} is minimized for pk=1subscript𝑝𝑘1p_{k}=1, in which case

vksubscript𝑣𝑘\displaystyle v_{k} =1nki=1nk(uk,i2(1)μk+uk,i2(0)1μk)1nki=1nk(uk,i(1)uk,i(0))2absent1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}-\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}
=1nki=1nk(1μkμkuk,i2(1)+μk1μkuk,i2(0)+2uk,i(1)uk,i(0))absent1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝜇𝑘subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖02subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖0\displaystyle=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{1-\mu_{k}}{\mu_{k}}u^{2}_{k,i}(1)+\frac{\mu_{k}}{1-\mu_{k}}u^{2}_{k,i}(0)+2u_{k,i}(1)u_{k,i}(0)\bigg{)}
=μk(1μk)1nki=1nk(uk,i2(1)μk2+uk,i2(0)(1μk)2+2uk,i(1)uk,i(0)μk(1μk))absentsubscript𝜇𝑘1subscript𝜇𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscriptsuperscript𝜇2𝑘subscriptsuperscript𝑢2𝑘𝑖0superscript1subscript𝜇𝑘22subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖0subscript𝜇𝑘1subscript𝜇𝑘\displaystyle=\mu_{k}(1-\mu_{k})\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu^{2}_{k}}+\frac{u^{2}_{k,i}(0)}{(1-\mu_{k})^{2}}+2\,\frac{u_{k,i}(1)u_{k,i}(0)}{\mu_{k}(1-\mu_{k})}\bigg{)}
=μk(1μk)1nki=1nk(uk,i(1)μk+uk,i(0)1μk)2.absentsubscript𝜇𝑘1subscript𝜇𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle=\mu_{k}(1-\mu_{k})\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}.

Therefore, the assumption

lim infk1nki=1nk(uk,i(1)μk+uk,i(0)1μk)2>0subscriptlimit-infimum𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘20\liminf\limits_{k\rightarrow\infty}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}>0

is enough for lim infkvk>0subscriptlimit-infimum𝑘subscript𝑣𝑘0\liminf_{k\rightarrow\infty}v_{k}>0. Notice now that

E[|ϑk,i,1|3]𝐸delimited-[]superscriptsubscriptitalic-ϑ𝑘𝑖13\displaystyle E[|\vartheta_{k,i,1}|^{3}] 1(nkpk)3/2E[|Rk,iWk,ipkμk|3]absent1superscriptsubscript𝑛𝑘subscript𝑝𝑘32𝐸delimited-[]superscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝜇𝑘3\displaystyle\leq\frac{1}{(n_{k}p_{k})^{3/2}}E[|R_{k,i}W_{k,i}-p_{k}\mu_{k}|^{3}]
=1(nkpk)3/2(1pkμk)3pkμk+(pkμk)3(1pkμk)absent1superscriptsubscript𝑛𝑘subscript𝑝𝑘32superscript1subscript𝑝𝑘subscript𝜇𝑘3subscript𝑝𝑘subscript𝜇𝑘superscriptsubscript𝑝𝑘subscript𝜇𝑘31subscript𝑝𝑘subscript𝜇𝑘\displaystyle=\frac{1}{(n_{k}p_{k})^{3/2}}(1-p_{k}\mu_{k})^{3}p_{k}\mu_{k}+(p_{k}\mu_{k})^{3}(1-p_{k}\mu_{k})
cpk(nkpk)3/2,absent𝑐subscript𝑝𝑘superscriptsubscript𝑛𝑘subscript𝑝𝑘32\displaystyle\leq c\frac{p_{k}}{(n_{k}p_{k})^{3/2}},

and the same bound holds for E[|ϑk,i,0|3]𝐸delimited-[]superscriptsubscriptitalic-ϑ𝑘𝑖03E[|\vartheta_{k,i,0}|^{3}]. Therefore, for the Lyapunov condition to hold, it is enough that

nkpk(nkpk)3/2=1nkpk0,subscript𝑛𝑘subscript𝑝𝑘superscriptsubscript𝑛𝑘subscript𝑝𝑘321subscript𝑛𝑘subscript𝑝𝑘0\frac{n_{k}p_{k}}{(n_{k}p_{k})^{3/2}}=\frac{1}{\sqrt{n_{k}p_{k}}}\longrightarrow 0,

or nkpksubscript𝑛𝑘subscript𝑝𝑘n_{k}p_{k}\rightarrow\infty. That is, assumptions (i)-(iii), which we used for the clustered case, are replaced by nkpksubscript𝑛𝑘subscript𝑝𝑘n_{k}p_{k}\rightarrow\infty.

A.2.2 Estimation of the variance

Let U^k,i=Yk,iα^kτ^kWk,isubscript^𝑈𝑘𝑖subscript𝑌𝑘𝑖subscript^𝛼𝑘subscript^𝜏𝑘subscript𝑊𝑘𝑖\widehat{U}_{k,i}=Y_{k,i}-\widehat{\alpha}_{k}-\widehat{\tau}_{k}W_{k,i} be the residuals from the regression of Yk,isubscript𝑌𝑘𝑖Y_{k,i} or a constant and Wk,isubscript𝑊𝑘𝑖W_{k,i}. Here, α^ksubscript^𝛼𝑘\widehat{\alpha}_{k} is the coefficient on the constant regressor equal to one, and τ^ksubscript^𝜏𝑘\widehat{\tau}_{k} is the coefficient on Wk,isubscript𝑊𝑘𝑖W_{k,i}. We have already shown vk1/2(τ^kτk)=𝒪p(1/nkpkqk)superscriptsubscript𝑣𝑘12subscript^𝜏𝑘subscript𝜏𝑘subscript𝒪𝑝1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘v_{k}^{-1/2}(\widehat{\tau}_{k}-\tau_{k})=\mathcal{O}_{p}(1/\sqrt{n_{k}p_{k}q_{k}}). The same is true about α^ksubscript^𝛼𝑘\widehat{\alpha}_{k} (e.g., apply the proof for τ^ksubscript^𝜏𝑘\widehat{\tau}_{k} after replacing each yk,i(1)subscript𝑦𝑘𝑖1y_{k,i}(1) with a zero). Define Σ^k=m=1mkΣ^k,msubscript^Σ𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript^Σ𝑘𝑚\widehat{\Sigma}_{k}=\sum_{m=1}^{m_{k}}\widehat{\Sigma}_{k,m}, where

Σ^k,m=(i=1nk1{mk,i=m}Rk,i(U^k,iWk,iU^k,i))(i=1nk1{mk,i=m}Rk,i(U^k,iWk,iU^k,i)).subscript^Σ𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript^𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript^𝑈𝑘𝑖superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript^𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript^𝑈𝑘𝑖\widehat{\Sigma}_{k,m}=\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\left(\begin{array}[]{c}\widehat{U}_{k,i}\\ W_{k,i}\widehat{U}_{k,i}\end{array}\right)\right)\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\left(\begin{array}[]{c}\widehat{U}_{k,i}\\ W_{k,i}\widehat{U}_{k,i}\end{array}\right)\right)^{\prime}.

Also, let

Q^k=i=1nkRk,i(1Wk,i)(1Wk,i),subscript^𝑄𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖superscript1subscript𝑊𝑘𝑖\widehat{Q}_{k}=\sum_{i=1}^{n_{k}}R_{k,i}\left(\begin{array}[]{c}1\\ W_{k,i}\end{array}\right)\left(\begin{array}[]{c}1\\ W_{k,i}\end{array}\right)^{\prime},

and z=(0,1)𝑧superscript01z=(0,1)^{\prime}. Then, the cluster estimator of the variance of Nk(τ^kτk)subscript𝑁𝑘subscript^𝜏𝑘subscript𝜏𝑘\sqrt{N_{k}}(\widehat{\tau}_{k}-\tau_{k}) is

V^kcluster=NkzQ^k1Σ^kQ^k1z.superscriptsubscript^𝑉𝑘clustersubscript𝑁𝑘superscript𝑧superscriptsubscript^𝑄𝑘1subscript^Σ𝑘superscriptsubscript^𝑄𝑘1𝑧\displaystyle\widehat{V}_{k}^{\rm{cluster}}=N_{k}z^{\prime}\widehat{Q}_{k}^{-1}\widehat{\Sigma}_{k}\widehat{Q}_{k}^{-1}z.

Notice that

(nkpkqk)1E[Q^k]=(1μkμkμk).superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1𝐸delimited-[]subscript^𝑄𝑘1subscript𝜇𝑘subscript𝜇𝑘subscript𝜇𝑘(n_{k}p_{k}q_{k})^{-1}E[\widehat{Q}_{k}]=\left(\begin{array}[]{cc}1&\mu_{k}\\ \mu_{k}&\mu_{k}\end{array}\right).

In addition,

1nkpkqkQ^k(2,2)=1nkpkqkm=1mki=1nk1{mk,i=m}Rk,iWk,i.1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript^𝑄𝑘221subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖\frac{1}{n_{k}p_{k}q_{k}}\widehat{Q}_{k}(2,2)=\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}.
var(i=1nk1{mk,i=m}Rk,iWk,i)varsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖\displaystyle\mbox{var}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}\Bigg{)} =nk,mpkqkμk(1pkqkμk)absentsubscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘\displaystyle=n_{k,m}p_{k}q_{k}\mu_{k}(1-p_{k}q_{k}\mu_{k})
+nk,m(nk,m1)pk2qk(σk2+μk2(1qk)).subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘21subscript𝑞𝑘\displaystyle+n_{k,m}(n_{k,m}-1)p_{k}^{2}q_{k}(\sigma_{k}^{2}+\mu_{k}^{2}(1-q_{k})).

Therefore, under conditions (i)-(iii), we obtain

var(1nkpkqkQ^k(2,2))var1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript^𝑄𝑘22\displaystyle\mbox{var}\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}}\widehat{Q}_{k}(2,2)\Bigg{)} cnkpkqk(1+pk(maxmnk,m1))absent𝑐subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚1\displaystyle\leq\frac{c}{n_{k}p_{k}q_{k}}\Bigg{(}1+p_{k}(\textstyle\max_{m}n_{k,m}-1)\Bigg{)}
=cmaxmnk,mnkqk+\scaleto𝒪5pt(1)absent𝑐subscript𝑚subscript𝑛𝑘𝑚subscript𝑛𝑘subscript𝑞𝑘\scaleto𝒪5𝑝𝑡1\displaystyle=c\,\frac{\textstyle\max_{m}n_{k,m}}{n_{k}q_{k}}+\scaleto{\mathcal{O}}{5pt}(1)
cmaxmnk,mminmnk,m1qkmk+\scaleto𝒪5pt(1)0.absent𝑐subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚1subscript𝑞𝑘subscript𝑚𝑘\scaleto𝒪5𝑝𝑡10\displaystyle\leq c\,\frac{\textstyle\max_{m}n_{k,m}}{\textstyle\min_{m}n_{k,m}}\,\frac{1}{q_{k}m_{k}}+\scaleto{\mathcal{O}}{5pt}(1)\longrightarrow 0.

Analogous calculations yield var((nkpkqk)1Q^k(1,1))0varsuperscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1subscript^𝑄𝑘110\mbox{var}((n_{k}p_{k}q_{k})^{-1}\widehat{Q}_{k}(1,1))\rightarrow 0. Therefore,

1nkpkqkQ^k=(1μkμkμk)+\scaleto𝒪5ptp(1)1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript^𝑄𝑘1subscript𝜇𝑘subscript𝜇𝑘subscript𝜇𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{1}{n_{k}p_{k}q_{k}}\widehat{Q}_{k}=\left(\begin{array}[]{cc}1&\mu_{k}\\ \mu_{k}&\mu_{k}\end{array}\right)+\scaleto{\mathcal{O}}{5pt}_{p}(1)

and

nkqkpkQ^k1=Hk+\scaleto𝒪5ptp(1),whereHk=1μk(1μk)(μkμkμk 1).formulae-sequencesubscript𝑛𝑘subscript𝑞𝑘subscript𝑝𝑘superscriptsubscript^𝑄𝑘1subscript𝐻𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1wheresubscript𝐻𝑘1subscript𝜇𝑘1subscript𝜇𝑘subscript𝜇𝑘subscript𝜇𝑘subscript𝜇𝑘1n_{k}q_{k}p_{k}\widehat{Q}_{k}^{-1}=H_{k}+\scaleto{\mathcal{O}}{5pt}_{p}(1),\quad\mbox{where}\quad H_{k}=\frac{1}{\mu_{k}(1-\mu_{k})}\left(\begin{array}[]{rc}\mu_{k}&-\mu_{k}\\ -\mu_{k}&\ \ 1\end{array}\right).

Now, let Uk,i=Yk,iαkτkWk,i=Wk,iuk,i(1)+(1Wk,i)uk,i(0)subscript𝑈𝑘𝑖subscript𝑌𝑘𝑖subscript𝛼𝑘subscript𝜏𝑘subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝑢𝑘𝑖11subscript𝑊𝑘𝑖subscript𝑢𝑘𝑖0U_{k,i}=Y_{k,i}-\alpha_{k}-\tau_{k}W_{k,i}=W_{k,i}u_{k,i}(1)+(1-W_{k,i})u_{k,i}(0). Notice that

vk1/2maxi=1,,nk|U^k,iUk,i|vk1/2|α^kαk|+vk1/2|τ^kτk|=𝒪p(1/nkpkqk).superscriptsubscript𝑣𝑘12subscript𝑖1subscript𝑛𝑘subscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖superscriptsubscript𝑣𝑘12subscript^𝛼𝑘subscript𝛼𝑘superscriptsubscript𝑣𝑘12subscript^𝜏𝑘subscript𝜏𝑘subscript𝒪𝑝1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘v_{k}^{-1/2}\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|\leq v_{k}^{-1/2}|\widehat{\alpha}_{k}-\alpha_{k}|+v_{k}^{-1/2}|\widehat{\tau}_{k}-\tau_{k}|=\mathcal{O}_{p}(1/\sqrt{n_{k}p_{k}q_{k}}).

Define  Σk=m=1mk Σk,msubscript Σ𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript Σ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}=\sum_{m=1}^{m_{k}}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k,m}, where

 Σk,m=(i=1nk1{mk,i=m}Rk,i(Uk,iWk,iUk,i))(i=1nk1{mk,i=m}Rk,i(Uk,iWk,iUk,i)).subscript Σ𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript𝑈𝑘𝑖superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript𝑈𝑘𝑖\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k,m}=\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\left(\begin{array}[]{c}U_{k,i}\\ W_{k,i}U_{k,i}\end{array}\right)\right)\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\left(\begin{array}[]{c}U_{k,i}\\ W_{k,i}U_{k,i}\end{array}\right)\right)^{\prime}.

We will show

1nkpkqkvk(Σ^k Σk)p0.superscript𝑝1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript^Σ𝑘subscript Σ𝑘0\frac{1}{n_{k}p_{k}q_{k}v_{k}}(\widehat{\Sigma}_{k}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k})\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Notice that

Σ^k,m(2,2) Σk,m(2,2)subscript^Σ𝑘𝑚22subscript Σ𝑘𝑚22\displaystyle\widehat{\Sigma}_{k,m}(2,2)-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k,m}(2,2) =(i=1nk1{mk,i=m}Rk,iWk,i(U^k,iUk,i))2absentsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖2\displaystyle=\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\widehat{U}_{k,i}-U_{k,i})\Bigg{)}^{2}
+2(i=1nk1{mk,i=m}Rk,iWk,iUk,i)(i=1nk1{mk,i=m}Rk,iWk,i(U^k,iUk,i)).2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑈𝑘𝑖superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖\displaystyle+2\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}U_{k,i}\Bigg{)}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\widehat{U}_{k,i}-U_{k,i})\Bigg{)}.

Therefore,

1nkpkqkvk|Σ^k(2,2) Σk(2,2)|1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript^Σ𝑘22subscript Σ𝑘22\displaystyle\frac{1}{n_{k}p_{k}q_{k}v_{k}}|\widehat{\Sigma}_{k}(2,2)-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}(2,2)| c1nkpkqkvkm=1mk(i=1nk1{mk,i=m}Rk,iWk,i)2absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖2\displaystyle\leq c\frac{1}{n_{k}p_{k}q_{k}v_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}\Bigg{)}^{2}
×(maxi=1,,nk|U^k,iUk,i|2+maxi=1,,nk|U^k,iUk,i|).absentsubscript𝑖1subscript𝑛𝑘superscriptsubscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖2subscript𝑖1subscript𝑛𝑘subscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖\displaystyle\qquad\times\Bigg{(}\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|^{2}+\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|\Bigg{)}.

The same expression holds for the off-diagonal elements of Σ^k,m Σk,msubscript^Σ𝑘𝑚subscript Σ𝑘𝑚\widehat{\Sigma}_{k,m}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k,m}. For Σ^k,m(1,1) Σk,m(1,1)subscript^Σ𝑘𝑚11subscript Σ𝑘𝑚11\widehat{\Sigma}_{k,m}(1,1)-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k,m}(1,1), the expression holds once we replace each Wk,isubscript𝑊𝑘𝑖W_{k,i} with a one. Let \|\cdot\| be the Frobenius norm of a matrix. Then,

1nkpkqkvkΣ^k Σk1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘normsubscript^Σ𝑘subscript Σ𝑘\displaystyle\frac{1}{n_{k}p_{k}q_{k}v_{k}}\|\widehat{\Sigma}_{k}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}\| c1nkpkqkvkm=1mk(i=1nk1{mk,i=m}Rk,i)2absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖2\displaystyle\leq c\frac{1}{n_{k}p_{k}q_{k}v_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Bigg{)}^{2}
×(maxi=1,,nk|U^k,iUk,i|2+maxi=1,,nk|U^k,iUk,i|).absentsubscript𝑖1subscript𝑛𝑘superscriptsubscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖2subscript𝑖1subscript𝑛𝑘subscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖\displaystyle\qquad\times\Bigg{(}\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|^{2}+\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|\Bigg{)}.

We will prove that the right-hand side of the previous equation converges to zero in probability. We will factorize each term into a expression that is bounded in probability and one that converges to zero in L1subscript𝐿1L_{1}.

E[m=1mk(i=1nk1{mk,i=m}Rk,i)2]𝐸delimited-[]superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖2\displaystyle E\Bigg{[}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Bigg{)}^{2}\Bigg{]} nkpkqk+nk(maxmnk,m1)pk2qk.absentsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘subscript𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘\displaystyle\leq n_{k}p_{k}q_{k}+n_{k}(\max_{m}n_{k,m}-1)p_{k}^{2}q_{k}.

For the first term, notice that

maxi=1,,nk|U^k,iconditionalsubscript𝑖1subscript𝑛𝑘subscript^𝑈𝑘𝑖\displaystyle\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i} Uk,i|2nkpkqk+nk(maxmnk,m1)pk2qknkpkqkvk\displaystyle-U_{k,i}|^{2}\frac{n_{k}p_{k}q_{k}+n_{k}(\max_{m}n_{k,m}-1)p_{k}^{2}q_{k}}{n_{k}p_{k}q_{k}v_{k}}
=nkpkqkvkmaxi=1,,nk|U^k,iUk,i|2(nkpkqk+nk(maxmnk,m1)pk2qk(nkpkqk)2)absentsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝑖1subscript𝑛𝑘superscriptsubscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖2subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘subscript𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘2\displaystyle=\frac{n_{k}p_{k}q_{k}}{v_{k}}\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|^{2}\Bigg{(}\frac{n_{k}p_{k}q_{k}+n_{k}(\max_{m}n_{k,m}-1)p_{k}^{2}q_{k}}{(n_{k}p_{k}q_{k})^{2}}\Bigg{)}
nkpkqkvkmaxi=1,,nk|U^k,iUk,i|2(1nkpkqk+maxmnk,m1minmnk,m1qkmk)absentsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝑖1subscript𝑛𝑘superscriptsubscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖21subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑚subscript𝑛𝑘𝑚1subscript𝑚subscript𝑛𝑘𝑚1subscript𝑞𝑘subscript𝑚𝑘\displaystyle\leq\frac{n_{k}p_{k}q_{k}}{v_{k}}\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|^{2}\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}}+\frac{\max_{m}n_{k,m}-1}{\min_{m}n_{k,m}}\frac{1}{q_{k}m_{k}}\Bigg{)}
=𝒪p(1)\scaleto𝒪5pt(1).absentsubscript𝒪𝑝1\scaleto𝒪5𝑝𝑡1\displaystyle=\mathcal{O}_{p}(1)\,\scaleto{\mathcal{O}}{5pt}(1).

For the second term, using the fact that vksubscript𝑣𝑘v_{k} is greater or equal to pkminmnk,m>0subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0p_{k}\min_{m}n_{k,m}>0 times a term with limit inferior that is bounded away from zero, we obtain

maxi=1,,nk|U^k,iconditionalsubscript𝑖1subscript𝑛𝑘subscript^𝑈𝑘𝑖\displaystyle\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i} Uk,i|nkpkqk+nk(maxmnk,m1)pk2qknkpkqkvkconditionalsubscript𝑈𝑘𝑖subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘subscript𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘\displaystyle-U_{k,i}|\frac{n_{k}p_{k}q_{k}+n_{k}(\max_{m}n_{k,m}-1)p_{k}^{2}q_{k}}{n_{k}p_{k}q_{k}v_{k}}
=(nkpkqkvk)1/2maxi=1,,nk|U^k,iUk,i|(nkpkqk+nk(maxmnk,m1)pk2qk(nkpkqk)3/2vk1/2)absentsuperscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘12subscript𝑖1subscript𝑛𝑘subscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘subscript𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘32superscriptsubscript𝑣𝑘12\displaystyle=\Big{(}\frac{n_{k}p_{k}q_{k}}{v_{k}}\Big{)}^{1/2}\!\!\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|\Bigg{(}\frac{n_{k}p_{k}q_{k}+n_{k}(\max_{m}n_{k,m}-1)p_{k}^{2}q_{k}}{(n_{k}p_{k}q_{k})^{3/2}v_{k}^{1/2}}\Bigg{)}
(nkpkqkvk)1/2maxi=1,,nk|U^k,iUk,i|(1(nkpkqkvk)1/2+maxmnk,m1minmnk,m1(qkmk)1/2)absentsuperscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘12subscript𝑖1subscript𝑛𝑘subscript^𝑈𝑘𝑖subscript𝑈𝑘𝑖1superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘12subscript𝑚subscript𝑛𝑘𝑚1subscript𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑞𝑘subscript𝑚𝑘12\displaystyle\leq\Big{(}\frac{n_{k}p_{k}q_{k}}{v_{k}}\Big{)}^{1/2}\!\!\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}-U_{k,i}|\Bigg{(}\frac{1}{(n_{k}p_{k}q_{k}v_{k})^{1/2}}+\frac{\max_{m}n_{k,m}-1}{\min_{m}n_{k,m}}\frac{1}{(q_{k}m_{k})^{1/2}}\Bigg{)}
=𝒪p(1)\scaleto𝒪5pt(1).absentsubscript𝒪𝑝1\scaleto𝒪5𝑝𝑡1\displaystyle=\mathcal{O}_{p}(1)\,\scaleto{\mathcal{O}}{5pt}(1).

As a result, we obtain

1nkpkqkvkΣ^k Σk=\scaleto𝒪5ptp(1).1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘normsubscript^Σ𝑘subscript Σ𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{1}{n_{k}p_{k}q_{k}v_{k}}\|\widehat{\Sigma}_{k}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}\|=\scaleto{\mathcal{O}}{5pt}_{p}(1).

Notice that

nkpkqkvksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘\displaystyle\frac{n_{k}p_{k}q_{k}}{v_{k}} Q^k1Σ^kQ^k1Hk ΣknkpkqkvkHk=Hk Σknkpkqkvk(nkpkqkQ^k1Hk)superscriptsubscript^𝑄𝑘1subscript^Σ𝑘superscriptsubscript^𝑄𝑘1subscript𝐻𝑘subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝐻𝑘subscript𝐻𝑘subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript^𝑄𝑘1subscript𝐻𝑘\displaystyle\widehat{Q}_{k}^{-1}\widehat{\Sigma}_{k}\widehat{Q}_{k}^{-1}-H_{k}\frac{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}}{n_{k}p_{k}q_{k}v_{k}}H_{k}=H_{k}\frac{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}}{n_{k}p_{k}q_{k}v_{k}}\Big{(}n_{k}p_{k}q_{k}\widehat{Q}_{k}^{-1}-H_{k}\Big{)}
+(nkpkqkQ^k1Hk) Σknkpkqkvk(nkpkqkQ^k1)+(nkpkqkQ^k1)Σ^k Σknkpkqkvk(nkpkqkQ^k1).subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript^𝑄𝑘1subscript𝐻𝑘subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript^𝑄𝑘1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript^𝑄𝑘1subscript^Σ𝑘subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript^𝑄𝑘1\displaystyle+\Big{(}n_{k}p_{k}q_{k}\widehat{Q}_{k}^{-1}-H_{k}\Big{)}\frac{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}}{n_{k}p_{k}q_{k}v_{k}}\Big{(}n_{k}p_{k}q_{k}\widehat{Q}_{k}^{-1}\Big{)}+\Big{(}n_{k}p_{k}q_{k}\widehat{Q}_{k}^{-1}\Big{)}\frac{\widehat{\Sigma}_{k}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}}{n_{k}p_{k}q_{k}v_{k}}\Big{(}n_{k}p_{k}q_{k}\widehat{Q}_{k}^{-1}\Big{)}.

Therefore, to show that the left-hand side of the last equation is \scaleto𝒪5ptp(1)\scaleto𝒪5𝑝subscript𝑡𝑝1\scaleto{\mathcal{O}}{5pt}_{p}(1), it is only left to show that  Σk/(nkpkqkvk)subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}/(n_{k}p_{k}q_{k}v_{k}) is 𝒪p(1)subscript𝒪𝑝1\mathcal{O}_{p}(1). We will prove this next. Notice that

1nkpkqkvk Σk1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘normsubscript Σ𝑘\displaystyle\frac{1}{n_{k}p_{k}q_{k}v_{k}}\|\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}\| c1nkpkqkvkm=1mk(i=1nk1{mk,i=m}Rk,i)2.absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖2\displaystyle\leq c\frac{1}{n_{k}p_{k}q_{k}v_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Bigg{)}^{2}.

Therefore,

E[1nkpkqkvk Σk]𝐸delimited-[]1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘normsubscript Σ𝑘\displaystyle E\Bigg{[}\frac{1}{n_{k}p_{k}q_{k}v_{k}}\|\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}\|\Bigg{]} c1nkpkqkvk(nkpkqk+nk(maxmnk,m1)pk2qk).absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘subscript𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘\displaystyle\leq c\frac{1}{n_{k}p_{k}q_{k}v_{k}}\Bigg{(}n_{k}p_{k}q_{k}+n_{k}(\max_{m}n_{k,m}-1)p_{k}^{2}q_{k}\Bigg{)}.

Then,

E[1nkpkqkvk Σk]𝐸delimited-[]1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘normsubscript Σ𝑘\displaystyle E\Bigg{[}\frac{1}{n_{k}p_{k}q_{k}v_{k}}\|\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}\|\Bigg{]} c(1vk+pk(maxmnk,m1)pkminmnk,m)<.absent𝑐1subscript𝑣𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚\displaystyle\leq c\Bigg{(}\frac{1}{v_{k}}+\frac{p_{k}(\max_{m}n_{k,m}-1)}{p_{k}\min_{m}n_{k,m}}\Bigg{)}<\infty.

We, therefore, obtain,

nkpkqkvkQ^k1Σ^kQ^k1Hk ΣknkpkqkvkHkp0.superscript𝑝subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘superscriptsubscript^𝑄𝑘1subscript^Σ𝑘superscriptsubscript^𝑄𝑘1subscript𝐻𝑘subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝐻𝑘0\frac{n_{k}p_{k}q_{k}}{v_{k}}\widehat{Q}_{k}^{-1}\widehat{\Sigma}_{k}\widehat{Q}_{k}^{-1}-H_{k}\frac{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}}{n_{k}p_{k}q_{k}v_{k}}H_{k}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Because Nk/(nkpkqk)p1superscript𝑝subscript𝑁𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1N_{k}/(n_{k}p_{k}q_{k})\stackrel{{\scriptstyle p}}{{\rightarrow}}1, we obtain

V^kcluster/vksuperscriptsubscript^𝑉𝑘clustersubscript𝑣𝑘\displaystyle\widehat{V}_{k}^{\rm{cluster}}/v_{k} =zHk ΣknkpkqkvkHkz+\scaleto𝒪5ptp(1)absentsuperscript𝑧subscript𝐻𝑘subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝐻𝑘𝑧\scaleto𝒪5𝑝subscript𝑡𝑝1\displaystyle=z^{\prime}H_{k}\frac{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}}{n_{k}p_{k}q_{k}v_{k}}H_{k}z+\scaleto{\mathcal{O}}{5pt}_{p}(1)
=1nkpkqkvk(1μk(1μk))2m=1mk(i=1nk1{mk,i=m}Rk,i(Wk,iμk)Uk,i)2+\scaleto𝒪5ptp(1).absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘superscript1subscript𝜇𝑘1subscript𝜇𝑘2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑈𝑘𝑖2\scaleto𝒪5𝑝subscript𝑡𝑝1\displaystyle=\frac{1}{n_{k}p_{k}q_{k}v_{k}}\left(\frac{1}{\mu_{k}(1-\mu_{k})}\right)^{2}\sum_{m=1}^{m_{k}}\bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\mu_{k})U_{k,i}\bigg{)}^{2}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

Recall that Uk,i2=uk,i2(1)Wk,i+uk,i2(0)(1Wk,i)subscriptsuperscript𝑈2𝑘𝑖subscriptsuperscript𝑢2𝑘𝑖1subscript𝑊𝑘𝑖subscriptsuperscript𝑢2𝑘𝑖01subscript𝑊𝑘𝑖U^{2}_{k,i}=u^{2}_{k,i}(1)W_{k,i}+u^{2}_{k,i}(0)(1-W_{k,i}). Notice that

E[(i=1nk\displaystyle E\bigg{[}\bigg{(}\sum_{i=1}^{n_{k}} 1{mk,i=m}Rk,i(Wk,iμk)Uk,i)2]\displaystyle 1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\mu_{k})U_{k,i}\bigg{)}^{2}\bigg{]}
=i=1nk1{mk,i=m}pkqkμk(1μk)((1μk)uk,i2(1)+μkuk,i2(0))absentsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘1subscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖0\displaystyle=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}p_{k}q_{k}\mu_{k}(1-\mu_{k})\Big{(}(1-\mu_{k})u^{2}_{k,i}(1)+\mu_{k}u^{2}_{k,i}(0)\Big{)}
+2i=1nk1j=i+1nk1{mk,i=mk,j=m}pk2qk[(σk2+μk2)(1μk)2uk,i(1)uk,j(1)\displaystyle+2\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}p_{k}^{2}q_{k}\Big{[}(\sigma_{k}^{2}+\mu_{k}^{2})(1-\mu_{k})^{2}u_{k,i}(1)u_{k,j}(1)
+μk(1μk)(σk2μk(1μk))(uk,i(0)uk,j(1)+uk,i(1)uk,j(0))subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘2subscript𝜇𝑘1subscript𝜇𝑘subscript𝑢𝑘𝑖0subscript𝑢𝑘𝑗1subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑗0\displaystyle\qquad+\mu_{k}(1-\mu_{k})(\sigma_{k}^{2}-\mu_{k}(1-\mu_{k}))(u_{k,i}(0)u_{k,j}(1)+u_{k,i}(1)u_{k,j}(0))
+(σk2+(1μk)2)μk2uk,i(0)uk,j(0)].\displaystyle\qquad+(\sigma_{k}^{2}+(1-\mu_{k})^{2})\mu_{k}^{2}u_{k,i}(0)u_{k,j}(0)\Big{]}.

Let

vkcluster=1nkpkqk(1μk(1μk))2m=1mkE[(i=1nk1{mk,i=m}Rk,i(Wk,iμk)Uk,i)2].superscriptsubscript𝑣𝑘cluster1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscript1subscript𝜇𝑘1subscript𝜇𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑈𝑘𝑖2v_{k}^{\rm cluster}=\frac{1}{n_{k}p_{k}q_{k}}\left(\frac{1}{\mu_{k}(1-\mu_{k})}\right)^{2}\sum_{m=1}^{m_{k}}E\bigg{[}\bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\mu_{k})U_{k,i}\bigg{)}^{2}\bigg{]}.

Then,

nkvkclustersubscript𝑛𝑘superscriptsubscript𝑣𝑘cluster\displaystyle n_{k}v_{k}^{\rm{cluster}} =i=1nk(uk,i2(1)μk+uk,i2(0)1μk)absentsuperscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘\displaystyle=\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}
+pkm=1mk[(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2i=1nk1{mk,i=m}(uk,i(1)uk,i(0))2]subscript𝑝𝑘superscriptsubscript𝑚1subscript𝑚𝑘delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle+p_{k}\sum_{m=1}^{m_{k}}\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}-\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}\Bigg{]}
+pkσk2m=1mk[(i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk)2].subscript𝑝𝑘subscriptsuperscript𝜎2𝑘superscriptsubscript𝑚1subscript𝑚𝑘delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle+p_{k}\sigma^{2}_{k}\sum_{m=1}^{m_{k}}\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}-\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}\Bigg{]}.

Alternatively, we can write

nkvkclustersubscript𝑛𝑘superscriptsubscript𝑣𝑘cluster\displaystyle n_{k}v_{k}^{\rm cluster} =i=1nk(uk,i2(1)μk+uk,i2(0)1μk)absentsuperscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘\displaystyle=\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}
pki=1nk(uk,i(1)uk,i(0))2pkσk2i=1nk(uk,i(1)μk+uk,i(0)1μk)2subscript𝑝𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02subscript𝑝𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle-p_{k}\sum_{i=1}^{n_{k}}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}^{2}-p_{k}\sigma_{k}^{2}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}^{2}
+pkm=1mk(i=1nk1{mk,i=m}(uk,i(1)uk,i(0)))2subscript𝑝𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝑢𝑘𝑖02\displaystyle+p_{k}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}u_{k,i}(1)-u_{k,i}(0)\big{)}\Bigg{)}^{2}
+pkσk2m=1mk(i=1nk1{mk,i=m}(uk,i(1)μk+uk,i(0)1μk))2.subscript𝑝𝑘subscriptsuperscript𝜎2𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑢𝑘𝑖1subscript𝜇𝑘subscript𝑢𝑘𝑖01subscript𝜇𝑘2\displaystyle+p_{k}\sigma^{2}_{k}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\bigg{(}\frac{u_{k,i}(1)}{\mu_{k}}+\frac{u_{k,i}(0)}{1-\mu_{k}}\bigg{)}\Bigg{)}^{2}.

We will next show that

zHk ΣknkpkqkvkHkzvkclustervkp0.superscript𝑝superscript𝑧subscript𝐻𝑘subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝐻𝑘𝑧superscriptsubscript𝑣𝑘clustersubscript𝑣𝑘0z^{\prime}H_{k}\frac{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}}{n_{k}p_{k}q_{k}v_{k}}H_{k}z-\frac{v_{k}^{\rm cluster}}{v_{k}}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Given the μk(1μk)subscript𝜇𝑘1subscript𝜇𝑘\mu_{k}(1-\mu_{k}) is bounded away from zero, by the weak law of large numbers for arrays, it is enough to show

1(nkpkqkvk)2m=1mkE[(i=1nk1{mk,i=m}Rk,i(Wk,iμk)Uk,i)4]0.1superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑈𝑘𝑖40\frac{1}{(n_{k}p_{k}q_{k}v_{k})^{2}}\sum_{m=1}^{m_{k}}E\bigg{[}\bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\mu_{k})U_{k,i}\bigg{)}^{4}\bigg{]}\longrightarrow 0.

Applying the multinomial theorem and the fact that all moments of Wk,isubscript𝑊𝑘𝑖W_{k,i} as well as all potential outcomes are bounded, we obtain:

1(nkpkqkvk)21superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘2\displaystyle\frac{1}{(n_{k}p_{k}q_{k}v_{k})^{2}} m=1mkE[(i=1nk1{mk,i=m}Rk,i(Wk,iμk)Uk,i)4]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑈𝑘𝑖4\displaystyle\sum_{m=1}^{m_{k}}E\bigg{[}\bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\mu_{k})U_{k,i}\bigg{)}^{4}\bigg{]}
c(nkpkqkvk)2(nkpkqk+nkpk2qkmaxmnk,m+nkpk3qkmaxmnk,m2+nkpk4qkmaxmnk,m3).absent𝑐superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘2subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝑝𝑘3subscript𝑞𝑘subscript𝑚subscriptsuperscript𝑛2𝑘𝑚subscript𝑛𝑘superscriptsubscript𝑝𝑘4subscript𝑞𝑘subscript𝑚subscriptsuperscript𝑛3𝑘𝑚\displaystyle\leq\frac{c}{(n_{k}p_{k}q_{k}v_{k})^{2}}\Big{(}n_{k}p_{k}q_{k}+n_{k}p_{k}^{2}q_{k}\max_{m}n_{k,m}+n_{k}p_{k}^{3}q_{k}\max_{m}n^{2}_{k,m}+n_{k}p_{k}^{4}q_{k}\max_{m}n^{3}_{k,m}\Big{)}.

Now, using lim supkmaxmnk,m/minmnk,m<subscriptlimit-supremum𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚\limsup_{k\rightarrow\infty}\max_{m}n_{k,m}/\min_{m}n_{k,m}<\infty, lim supkpkminmnk,m/vk<subscriptlimit-supremum𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑣𝑘\limsup_{k\rightarrow\infty}p_{k}\min_{m}n_{k,m}/v_{k}<\infty, and qkmksubscript𝑞𝑘subscript𝑚𝑘q_{k}m_{k}\rightarrow\infty we obtain

1(nkpkqkvk)21superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘2\displaystyle\frac{1}{(n_{k}p_{k}q_{k}v_{k})^{2}} m=1mkE[(i=1nk1{mk,i=m}Rk,i(Wk,iμk)Uk,i)4]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑈𝑘𝑖4\displaystyle\sum_{m=1}^{m_{k}}E\bigg{[}\bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\mu_{k})U_{k,i}\bigg{)}^{4}\bigg{]}
c(1nkpkqkvk2+maxmnk,mminmnk,m1qkmkvk2+pkmaxmnk,m2vkminmnk,m1qkmkvk+pk2maxmnk,m3vk2minmnk,m1qkmk)absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑣𝑘2subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚1subscript𝑞𝑘subscript𝑚𝑘superscriptsubscript𝑣𝑘2subscript𝑝𝑘subscript𝑚subscriptsuperscript𝑛2𝑘𝑚subscript𝑣𝑘subscript𝑚subscript𝑛𝑘𝑚1subscript𝑞𝑘subscript𝑚𝑘subscript𝑣𝑘superscriptsubscript𝑝𝑘2subscript𝑚subscriptsuperscript𝑛3𝑘𝑚subscriptsuperscript𝑣2𝑘subscript𝑚subscript𝑛𝑘𝑚1subscript𝑞𝑘subscript𝑚𝑘\displaystyle\leq c\left(\frac{1}{n_{k}p_{k}q_{k}v_{k}^{2}}+\frac{\max_{m}n_{k,m}}{\min_{m}n_{k,m}}\frac{1}{q_{k}m_{k}v_{k}^{2}}+\frac{p_{k}\max_{m}n^{2}_{k,m}}{v_{k}\min_{m}n_{k,m}}\frac{1}{q_{k}m_{k}v_{k}}+\frac{p_{k}^{2}\max_{m}n^{3}_{k,m}}{v^{2}_{k}\min_{m}n_{k,m}}\frac{1}{q_{k}m_{k}}\right)
0.absent0\displaystyle\longrightarrow 0.

As a result,

V^kclustervk=vkclustervk+\scaleto𝒪5ptp(1).superscriptsubscript^𝑉𝑘clustersubscript𝑣𝑘superscriptsubscript𝑣𝑘clustersubscript𝑣𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{\widehat{V}_{k}^{\rm{cluster}}}{v_{k}}=\frac{v_{k}^{\rm cluster}}{v_{k}}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

The robust (sandwich) estimator of the variance of Nk(τ^kτk)subscript𝑁𝑘subscript^𝜏𝑘subscript𝜏𝑘\sqrt{N_{k}}(\widehat{\tau}_{k}-\tau_{k}) is given by

V^krobust=NkzQ^k1Ω^kQ^k1z.superscriptsubscript^𝑉𝑘robustsubscript𝑁𝑘superscript𝑧superscriptsubscript^𝑄𝑘1subscript^Ω𝑘superscriptsubscript^𝑄𝑘1𝑧\widehat{V}_{k}^{\rm robust}=N_{k}z^{\prime}\widehat{Q}_{k}^{-1}\widehat{\Omega}_{k}\widehat{Q}_{k}^{-1}z.

where

Ω^k=i=1nkRk,i(U^k,iWk,iU^k,i)(U^k,iWk,iU^k,i).subscript^Ω𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript^𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript^𝑈𝑘𝑖superscriptsubscript^𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript^𝑈𝑘𝑖\widehat{\Omega}_{k}=\sum_{i=1}^{n_{k}}R_{k,i}\left(\begin{array}[]{c}\widehat{U}_{k,i}\\ W_{k,i}\widehat{U}_{k,i}\end{array}\right)\left(\begin{array}[]{c}\widehat{U}_{k,i}\\ W_{k,i}\widehat{U}_{k,i}\end{array}\right)^{\prime}.

We will derive the limit of V^krobust/vksuperscriptsubscript^𝑉𝑘robustsubscript𝑣𝑘\widehat{V}_{k}^{\rm robust}/v_{k}. Let

 Ωk=i=1nkRk,i(Uk,iWk,iUk,i)(Uk,iWk,iUk,i).subscript Ω𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript𝑈𝑘𝑖superscriptsubscript𝑈𝑘𝑖subscript𝑊𝑘𝑖subscript𝑈𝑘𝑖\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Omega$\kern-0.18004pt}}}_{k}=\sum_{i=1}^{n_{k}}R_{k,i}\left(\begin{array}[]{c}U_{k,i}\\ W_{k,i}U_{k,i}\end{array}\right)\left(\begin{array}[]{c}U_{k,i}\\ W_{k,i}U_{k,i}\end{array}\right)^{\prime}.

Because potential outcomes (and Wk,isubscript𝑊𝑘𝑖W_{k,i}) are bounded, we obtain

1nkpkqkvkΩ^k Ωkc(1nkpkqkvki=1nkRk,i)maxi=1,,nk|U^k,i2Uk,i2|.1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘normsubscript^Ω𝑘subscript Ω𝑘𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖subscript𝑖1subscript𝑛𝑘superscriptsubscript^𝑈𝑘𝑖2superscriptsubscript𝑈𝑘𝑖2\displaystyle\frac{1}{n_{k}p_{k}q_{k}v_{k}}\|\widehat{\Omega}_{k}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Omega$\kern-0.18004pt}}}_{k}\|\leq c\left(\frac{1}{n_{k}p_{k}q_{k}v_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\right)\max_{i=1,\ldots,n_{k}}|\widehat{U}_{k,i}^{2}-U_{k,i}^{2}|.

Because the limsup of the expectation of the first factor (which is non-negative) is bounded and the second factor converges to zero in probability as proved above, we obtain

1nkpkqkvkΩ^k Ωk=\scaleto𝒪5ptp(1).1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘normsubscript^Ω𝑘subscript Ω𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{1}{n_{k}p_{k}q_{k}v_{k}}\|\widehat{\Omega}_{k}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Omega$\kern-0.18004pt}}}_{k}\|=\scaleto{\mathcal{O}}{5pt}_{p}(1).

Notice that

1nkpkqkvk Ωk1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘normsubscript Ω𝑘\displaystyle\frac{1}{n_{k}p_{k}q_{k}v_{k}}\|\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Omega$\kern-0.18004pt}}}_{k}\| c(1nkpkqkvki=1nkRk,i)absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖\displaystyle\leq c\left(\frac{1}{n_{k}p_{k}q_{k}v_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\right)

Again, the limsup of the expectation of the right-hand side of this equation is non-negative and bounded. As a result, we obtain  Ωk/(nkpkqkvk)=𝒪p(1)normsubscript Ω𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝒪𝑝1\|\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Omega$\kern-0.18004pt}}}_{k}\|/(n_{k}p_{k}q_{k}v_{k})=\mathcal{O}_{p}(1).

V^krobust/vksuperscriptsubscript^𝑉𝑘robustsubscript𝑣𝑘\displaystyle\widehat{V}_{k}^{\rm{robust}}/v_{k} =zHk ΩknkpkqkvkHkz+\scaleto𝒪5ptp(1)absentsuperscript𝑧subscript𝐻𝑘subscript Ω𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘subscript𝐻𝑘𝑧\scaleto𝒪5𝑝subscript𝑡𝑝1\displaystyle=z^{\prime}H_{k}\frac{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Omega$\kern-0.18004pt}}}_{k}}{n_{k}p_{k}q_{k}v_{k}}H_{k}z+\scaleto{\mathcal{O}}{5pt}_{p}(1)
=1nkpkqkvk(1μk(1μk))2i=1nkRk,i(Wk,iμk)2Uk,i2+\scaleto𝒪5ptp(1).absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘superscript1subscript𝜇𝑘1subscript𝜇𝑘2superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript𝑊𝑘𝑖subscript𝜇𝑘2superscriptsubscript𝑈𝑘𝑖2\scaleto𝒪5𝑝subscript𝑡𝑝1\displaystyle=\frac{1}{n_{k}p_{k}q_{k}v_{k}}\left(\frac{1}{\mu_{k}(1-\mu_{k})}\right)^{2}\sum_{i=1}^{n_{k}}R_{k,i}(W_{k,i}-\mu_{k})^{2}U_{k,i}^{2}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

Notice that

E[i=1nk\displaystyle E\bigg{[}\sum_{i=1}^{n_{k}} Rk,i(Wk,iμk)2Uk,i2]=i=1nkpkqkμk(1μk)((1μk)uk,i2(1)+μkuk,i2(0)).\displaystyle R_{k,i}(W_{k,i}-\mu_{k})^{2}U_{k,i}^{2}\bigg{]}=\sum_{i=1}^{n_{k}}p_{k}q_{k}\mu_{k}(1-\mu_{k})\Big{(}(1-\mu_{k})u^{2}_{k,i}(1)+\mu_{k}u^{2}_{k,i}(0)\Big{)}.

Finally, notice that

1(nkpkqkvk)2m=1mkE[(i=1nkRk,i(Wk,iμk)2Uk,i2)2]1superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript𝑊𝑘𝑖subscript𝜇𝑘2superscriptsubscript𝑈𝑘𝑖22\displaystyle\frac{1}{(n_{k}p_{k}q_{k}v_{k})^{2}}\sum_{m=1}^{m_{k}}E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}R_{k,i}(W_{k,i}-\mu_{k})^{2}U_{k,i}^{2}\Bigg{)}^{2}\Bigg{]} cnkpkqk+nkpk2qkmaxmnk,m(nkpkqkvk)2absent𝑐subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑚subscript𝑛𝑘𝑚superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑣𝑘2\displaystyle\leq c\frac{n_{k}p_{k}q_{k}+n_{k}p_{k}^{2}q_{k}\max_{m}n_{k,m}}{(n_{k}p_{k}q_{k}v_{k})^{2}}
c(1nkpkqkvk2+maxmnk,mminmnk,m1qkmkvk2)absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑣𝑘2subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚1subscript𝑞𝑘subscript𝑚𝑘superscriptsubscript𝑣𝑘2\displaystyle\leq c\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}v_{k}^{2}}+\frac{\max_{m}n_{k,m}}{\min_{m}n_{k,m}}\frac{1}{q_{k}m_{k}v_{k}^{2}}\Bigg{)}
0.absent0\displaystyle\longrightarrow 0.

Therefore, by the weak law of large numbers for arrays, we obtain

V^krobustvk=vkrobustvk+\scaleto𝒪5ptp(1),superscriptsubscript^𝑉𝑘robustsubscript𝑣𝑘superscriptsubscript𝑣𝑘robustsubscript𝑣𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{\widehat{V}_{k}^{\rm{robust}}}{v_{k}}=\frac{v_{k}^{\rm robust}}{v_{k}}+\scaleto{\mathcal{O}}{5pt}_{p}(1),

where

vkrobust=1nki=1nk(uk,i2(1)μk+uk,i2(0)1μk).superscriptsubscript𝑣𝑘robust1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑢2𝑘𝑖1subscript𝜇𝑘subscriptsuperscript𝑢2𝑘𝑖01subscript𝜇𝑘v_{k}^{\rm robust}=\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}\bigg{(}\frac{u^{2}_{k,i}(1)}{\mu_{k}}+\frac{u^{2}_{k,i}(0)}{1-\mu_{k}}\bigg{)}.

A.3 Fixed effects

A.3.1 Large k𝑘k distribution

Let

 Nk,m=i=1nk1{mk,i=m}Rk,isubscript N𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}

and

τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\displaystyle\widehat{\tau}_{k}^{\rm{\,fixed}} =m=1mki=1nk1{mk,i=m}Rk,iYk,i(Wk,i Wk,m)m=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m),absentsuperscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑌𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=\frac{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}Y_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}, (A.5)

where

 Wk,m=1 Nk,m1i=1nk1{mk,i=m}Rk,iWk,i.subscript W𝑘𝑚1subscript N𝑘𝑚1superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}=\frac{1}{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\vee 1}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}.

Notice that we need lim infkμk(1μk)σk2=lim infkE[Ak,m(1Ak,m)]>0subscriptlimit-infimum𝑘subscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘subscriptlimit-infimum𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚0\liminf_{k\rightarrow\infty}\mu_{k}(1-\mu_{k})-\sigma^{2}_{k}=\liminf_{k\rightarrow\infty}E[A_{k,m}(1-A_{k,m})]>0 for this estimator to be well-defined in large samples (otherwise, the denominator in the formula for τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\widehat{\tau}_{k}^{\rm{\,fixed}} could be equal to zero). Although it is not strictly necessary, and because it entails little loss of generality and simplifies the exposition, we will assume that the supports of the cluster probabilities, Ak,msubscript𝐴𝑘𝑚A_{k,m}, are bounded away from zero and one (uniformly in k𝑘k and m𝑚m). In finite samples we assign τ^kfixed=0superscriptsubscript^𝜏𝑘fixed0\widehat{\tau}_{k}^{\rm{\,fixed}}=0 to the cases when the denominator of τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\widehat{\tau}_{k}^{\rm{\,fixed}} in equation (A.5) is equal to zero. Notice that

i=1nk1{mk,i=m}Rk,i(Wk,i Wk,m)=0.superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚0\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})=0.

Let

αk,m=1nk,mi=1nk1{mk,i=m}yk,i(0),τk,m=1nk,mi=1nk1{mk,i=m}(yk,i(1)yk,i(0)),formulae-sequencesubscript𝛼𝑘𝑚1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑦𝑘𝑖0subscript𝜏𝑘𝑚1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑦𝑘𝑖1subscript𝑦𝑘𝑖0\alpha_{k,m}=\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}y_{k,i}(0),\quad\tau_{k,m}=\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(y_{k,i}(1)-y_{k,i}(0)),

ek,i(0)=yk,i(0)αk,mk,isubscript𝑒𝑘𝑖0subscript𝑦𝑘𝑖0subscript𝛼𝑘subscript𝑚𝑘𝑖e_{k,i}(0)=y_{k,i}(0)-\alpha_{k,m_{k,i}}, and ek,i(1)=yk,i(1)αk,mk,iτk,mk,isubscript𝑒𝑘𝑖1subscript𝑦𝑘𝑖1subscript𝛼𝑘subscript𝑚𝑘𝑖subscript𝜏𝑘subscript𝑚𝑘𝑖e_{k,i}(1)=y_{k,i}(1)-\alpha_{k,m_{k,i}}-\tau_{k,m_{k,i}}. It follows that

i=1nk1{mk,i=m}ek,i(1)=i=1nk1{mk,i=m}ek,i(0)=0.superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑒𝑘𝑖1superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑒𝑘𝑖00\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}(1)=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}(0)=0.

Now, Yk,i=ek,i(1)Wk,i+ek,i(0)(1Wk,i)+αk,mk,i+τk,mk,iWk,isubscript𝑌𝑘𝑖subscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝛼𝑘subscript𝑚𝑘𝑖subscript𝜏𝑘subscript𝑚𝑘𝑖subscript𝑊𝑘𝑖Y_{k,i}=e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})+\alpha_{k,m_{k,i}}+\tau_{k,m_{k,i}}W_{k,i}. Then,

τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\displaystyle\widehat{\tau}_{k}^{\rm{\,fixed}} =m=1mki=1nk1{mk,i=m}Rk,i((ek,i(1)+τk,m)Wk,i+ek,i(0)(1Wk,i))(Wk,i Wk,m)m=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m).absentsuperscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑒𝑘𝑖1subscript𝜏𝑘𝑚subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=\frac{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}((e_{k,i}(1)+\tau_{k,m})W_{k,i}+e_{k,i}(0)(1-W_{k,i}))(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}.

Let

 τksubscript τ𝑘\displaystyle\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k} =m=1mkτk,mi=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m)m=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m),absentsuperscriptsubscript𝑚1subscript𝑚𝑘subscript𝜏𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=\frac{\displaystyle\sum_{m=1}^{m_{k}}\tau_{k,m}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}, (A.6)

where, as before, we make  τk=0subscript τ𝑘0\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k}=0 if the denominator on the right-hand side of (A.6) is equal to zero. Now, τ^kfixedτk=(τ^kfixed τk)+( τkτk)superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘superscriptsubscript^𝜏𝑘fixedsubscript τ𝑘subscript τ𝑘subscript𝜏𝑘\widehat{\tau}_{k}^{\rm{\,fixed}}-\tau_{k}=(\widehat{\tau}_{k}^{\rm{\,fixed}}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k})+(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k}-\tau_{k}), where

τ^kfixed τk=m=1mki=1nk1{mk,i=m}Rk,i(ek,i(1)Wk,i+ek,i(0)(1Wk,i))(Wk,i Wk,m)m=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m)superscriptsubscript^𝜏𝑘fixedsubscript τ𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚\widehat{\tau}_{k}^{\rm{\,fixed}}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k}=\frac{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i}))(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}

and

 τkτk=m=1mk(τk,mτk)i=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m)m=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m).subscript τ𝑘subscript𝜏𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝜏𝑘𝑚subscript𝜏𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k}-\tau_{k}=\frac{\displaystyle\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}{\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}.

Notice that outcomes enter the term τ^kfixed τksuperscriptsubscript^𝜏𝑘fixedsubscript τ𝑘\widehat{\tau}_{k}^{\rm{\,fixed}}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k} only through the intra-cluster errors, ek,i(1)subscript𝑒𝑘𝑖1e_{k,i}(1) and ek,i(0)subscript𝑒𝑘𝑖0e_{k,i}(0). In contrast, the term  τkτksubscript τ𝑘subscript𝜏𝑘\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k}-\tau_{k} depends on outcomes only through inter-cluster variability in treatment effects, τk,mτksubscript𝜏𝑘𝑚subscript𝜏𝑘\tau_{k,m}-\tau_{k}. The numerator in the expression for  τkτksubscript τ𝑘subscript𝜏𝑘\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\tau$\kern-0.18004pt}}}_{k}-\tau_{k} in the last displayed equation does not have mean zero in general, and this will be reflected in a bias term, Bksubscript𝐵𝑘B_{k}, which we define next. Let,

Dk=1nkpkqkm=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m),subscript𝐷𝑘1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚D_{k}=\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}),

and

Bk=1nkpkE[Ak,m(1Ak,m)]m=1mk(τk,mτk)(1(1pk)nk,m)1nkpkqkm=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m).subscript𝐵𝑘1subscript𝑛𝑘subscript𝑝𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘subscript𝜏𝑘𝑚subscript𝜏𝑘1superscript1subscript𝑝𝑘subscript𝑛𝑘𝑚1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚B_{k}=-\frac{\displaystyle\frac{1}{n_{k}p_{k}}E[A_{k,m}(1-A_{k,m})]\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})(1-(1-p_{k})^{n_{k,m}})}{\displaystyle\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})}.

Then, nkpkqk(τ^kfixedτkBk)=Fk/Dksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘subscript𝐵𝑘subscript𝐹𝑘subscript𝐷𝑘\sqrt{n_{k}p_{k}q_{k}}(\widehat{\tau}_{k}^{\rm{\,fixed}}-\tau_{k}-B_{k})=F_{k}/D_{k}, where

Fk=m=1mk(ψk,m ψk,m)+(φk,m φk,m),subscript𝐹𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝜓𝑘𝑚subscript ψ𝑘𝑚subscript𝜑𝑘𝑚subscript φ𝑘𝑚F_{k}=\sum_{m=1}^{m_{k}}(\psi_{k,m}-\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m})+(\varphi_{k,m}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m}),
ψk,msubscript𝜓𝑘𝑚\displaystyle\psi_{k,m} =1nkpkqki=1nk1{mk,i=m}Rk,i(ek,i(1)Wk,i+ek,i(0)(1Wk,i))(Wk,iAk,m),absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚\displaystyle=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i}))(W_{k,i}-A_{k,m}),
 ψk,msubscript ψ𝑘𝑚\displaystyle\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m} =1nkpkqki=1nk1{mk,i=m}Rk,i(ek,i(1)Wk,i+ek,i(0)(1Wk,i))( Wk,mAk,m),absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝐴𝑘𝑚\displaystyle=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i}))(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m}),
φk,msubscript𝜑𝑘𝑚\displaystyle\varphi_{k,m} =1nkpkqk(τk,mτk)i=1nk1{mk,i=m}(Rk,iWk,i(Wk,iAk,m)pkqkE[Ak,m(1Ak,m)]),absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝜏𝑘𝑚subscript𝜏𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚\displaystyle=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}(\tau_{k,m}-\tau_{k})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\big{(}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})-p_{k}q_{k}E[A_{k,m}(1-A_{k,m})]\big{)},
and
 φk,msubscript φ𝑘𝑚\displaystyle\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m} =1nkpkqk(τk,mτk)(i=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m)\displaystyle=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}(\tau_{k,m}-\tau_{k})\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})
qkE[Ak,m(1Ak,m)](1(1pk)nk,m)).\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad-q_{k}E[A_{k,m}(1-A_{k,m})](1-(1-p_{k})^{n_{k,m}})\Bigg{)}.

The terms ψk,msubscript𝜓𝑘𝑚\psi_{k,m} and  ψk,msubscript ψ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m} depend on the within-cluster errors ek,i(1)subscript𝑒𝑘𝑖1e_{k,i}(1) and ek,i(0)subscript𝑒𝑘𝑖0e_{k,i}(0). The terms φk,msubscript𝜑𝑘𝑚\varphi_{k,m} and  φk,msubscript φ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m} depend on the inter-clusters errors τk,mτksubscript𝜏𝑘𝑚subscript𝜏𝑘\tau_{k,m}-\tau_{k}. ψk,msubscript𝜓𝑘𝑚\psi_{k,m} and φk,msubscript𝜑𝑘𝑚\varphi_{k,m} replace  Wk,msubscript W𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m} with Ak,msubscript𝐴𝑘𝑚A_{k,m}, while  ψk,msubscript ψ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m} and  φk,msubscript φ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m} correct for the difference,  Wk,mAk,msubscript W𝑘𝑚subscript𝐴𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m}.

It can be seen (in intermediate calculations below) that

E[i=1nk1{mk,i=m}Rk,iWk,i(Wk,iAk,m)]=nk,mpkqkE[Ak,m(1Ak,m)]𝐸delimited-[]superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚E\Bigg{[}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})\Bigg{]}=n_{k,m}p_{k}q_{k}E[A_{k,m}(1-A_{k,m})]

and

E[i=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m)]𝐸delimited-[]superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝐴𝑘𝑚\displaystyle E\Bigg{[}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{]} =qkE[Ak,m(1Ak,m)](1(1pk)nk,m).absentsubscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚1superscript1subscript𝑝𝑘subscript𝑛𝑘𝑚\displaystyle=q_{k}E[A_{k,m}(1-A_{k,m})](1-(1-p_{k})^{n_{k,m}}).

These two expectations are substracted in φk,msubscript𝜑𝑘𝑚\varphi_{k,m} and  φk,msubscript φ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m} , so φk,msubscript𝜑𝑘𝑚\varphi_{k,m} and  φk,msubscript φ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m} have mean zero. Doing so for φk,msubscript𝜑𝑘𝑚\varphi_{k,m} does not require adjustments elsewhere. Because

m=1mk(τk,mτk)nk,m=0,superscriptsubscript𝑚1subscript𝑚𝑘subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝑛𝑘𝑚0\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})n_{k,m}=0,

the nk,mpkqkE[Ak,m(1Ak,m)]subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚n_{k,m}p_{k}q_{k}E[A_{k,m}(1-A_{k,m})] terms do not change the sum Fksubscript𝐹𝑘F_{k}. In contrast, demeaning  φk,msubscript φ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m} creates the bias term Bksubscript𝐵𝑘B_{k}. If the size of the clusters nk,msubscript𝑛𝑘𝑚n_{k,m} does not vary across clusters, then Bksubscript𝐵𝑘B_{k} is equal to zero. More generally, nkpkqkDkBk=𝒪(mkqk/(nkpk))subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝐷𝑘subscript𝐵𝑘𝒪subscript𝑚𝑘subscript𝑞𝑘subscript𝑛𝑘subscript𝑝𝑘\sqrt{n_{k}p_{k}q_{k}}D_{k}B_{k}=\mathcal{O}(m_{k}\sqrt{q_{k}/(n_{k}p_{k})}). Therefore, if

mkqkpk(nk/mk)0,subscript𝑚𝑘subscript𝑞𝑘subscript𝑝𝑘subscript𝑛𝑘subscript𝑚𝑘0\frac{m_{k}q_{k}}{p_{k}(n_{k}/m_{k})}\longrightarrow 0, (A.7)

(that is, if the expected number of sampled clusters is small relative to the expected number of sampled observations per sampled cluster) then nkpkqkDkBksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝐷𝑘subscript𝐵𝑘\sqrt{n_{k}p_{k}q_{k}}D_{k}B_{k} converges to zero. As a result, nkpkqkBksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝐵𝑘\sqrt{n_{k}p_{k}q_{k}}B_{k} converges in probability to zero, because, as we will show later, Dksubscript𝐷𝑘D_{k} converges in probability to μk(1μk)σk2subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘2\mu_{k}(1-\mu_{k})-\sigma_{k}^{2}, which is bounded away from zero. In our large sample analysis, we will assume that the expected number of sampled clusters grows to infinity, mkqksubscript𝑚𝑘subscript𝑞𝑘m_{k}q_{k}\rightarrow\infty. Then, equation (A.7) implies that the expected number of observations per sampled cluster goes to infinity, pk(nk/mk)subscript𝑝𝑘subscript𝑛𝑘subscript𝑚𝑘p_{k}(n_{k}/m_{k})\rightarrow\infty. Notice also that nkpkqk=(nkpk/mk)(mkqk)subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑚𝑘subscript𝑚𝑘subscript𝑞𝑘n_{k}p_{k}q_{k}=(n_{k}p_{k}/m_{k})(m_{k}q_{k})\rightarrow\infty.

We summarize now the assumptions we made thus far. We first assumed that the supports of the cluster probabilities, Ak,msubscript𝐴𝑘𝑚A_{k,m}, are bounded away from zero and one (uniformly in k𝑘k and m𝑚m), and that potential outcomes are bounded. Moreover, we assumed mkqksubscript𝑚𝑘subscript𝑞𝑘m_{k}q_{k}\rightarrow\infty and (mkqk)/((pknk)/mk)0subscript𝑚𝑘subscript𝑞𝑘subscript𝑝𝑘subscript𝑛𝑘subscript𝑚𝑘0(m_{k}q_{k})/((p_{k}n_{k})/m_{k})\rightarrow 0. These imply (pknk)/mksubscript𝑝𝑘subscript𝑛𝑘subscript𝑚𝑘(p_{k}n_{k})/m_{k}\rightarrow\infty and nkpkqksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘n_{k}p_{k}q_{k}\rightarrow\infty. We will add the assumption that the ratio between maximum and minimum cluster size is bounded, lim supkmaxmnk,m/minmnk,m<subscriptlimit-supremum𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚\limsup_{k\rightarrow\infty}\max_{m}n_{k,m}/\min_{m}n_{k,m}<\infty. This assumption implies pkminmnk,msubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚p_{k}\min_{m}n_{k,m}\rightarrow\infty and (mkqk)/(pkminmnk,m)0subscript𝑚𝑘subscript𝑞𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0(m_{k}q_{k})/(p_{k}\min_{m}n_{k,m})\rightarrow 0.

We will now study the behavior of Dksubscript𝐷𝑘D_{k}. Notice that

E[m=1mki=1nk\displaystyle E\Bigg{[}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}} 1{mk,i=m}Rk,iWk,i(Wk,i Wk,m)]\displaystyle 1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{]}
=E[m=1mki=1nk1{mk,i=m}Rk,iWk,i(Wk,iAk,m)]absent𝐸delimited-[]superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚\displaystyle=E\Bigg{[}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})\Bigg{]}
E[m=1mki=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m)]𝐸delimited-[]superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝐴𝑘𝑚\displaystyle-E\Bigg{[}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{]}
=nkpkqkE[Ak,m(1Ak,m)]qkE[Ak,m(1Ak,m)]m=1mk(1(1pk)nk,m).absentsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘1superscript1subscript𝑝𝑘subscript𝑛𝑘𝑚\displaystyle=n_{k}p_{k}q_{k}E[A_{k,m}(1-A_{k,m})]-q_{k}E[A_{k,m}(1-A_{k,m})]\sum_{m=1}^{m_{k}}(1-(1-p_{k})^{n_{k,m}}).

In addition,

1(nkpkqk)2m=1mkE[(\displaystyle\frac{1}{(n_{k}p_{k}q_{k})^{2}}\sum_{m=1}^{m_{k}}E\Bigg{[}\Bigg{(} i=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2]\displaystyle\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}\Bigg{]}
cnkpkqk+nkpk2qkmaxmnk,m(nkpkqk)2absent𝑐subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑛𝑘superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑚subscript𝑛𝑘𝑚superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘2\displaystyle\leq c\,\frac{n_{k}p_{k}q_{k}+n_{k}p_{k}^{2}q_{k}\max_{m}n_{k,m}}{(n_{k}p_{k}q_{k})^{2}}
=c(1nkpkqk+maxmnk,mminmnk,m1mkqk)0.absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚1subscript𝑚𝑘subscript𝑞𝑘0\displaystyle=c\,\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}}+\frac{\max_{m}n_{k,m}}{\min_{m}n_{k,m}}\frac{1}{m_{k}q_{k}}\Bigg{)}\longrightarrow 0.

The weak law of large numbers for arrays implies

DkE[Ak,m(1Ak,m)]+1nkpkE[Ak,m(1Ak,m)]m=1mk(1(1pk)nk,m)p0.superscript𝑝subscript𝐷𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘subscript𝑝𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘1superscript1subscript𝑝𝑘subscript𝑛𝑘𝑚0D_{k}-E[A_{k,m}(1-A_{k,m})]+\frac{1}{n_{k}p_{k}}E[A_{k,m}(1-A_{k,m})]\sum_{m=1}^{m_{k}}(1-(1-p_{k})^{n_{k,m}})\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Because mk/(nkpk)0subscript𝑚𝑘subscript𝑛𝑘subscript𝑝𝑘0m_{k}/(n_{k}p_{k})\rightarrow 0 and E[Ak,m(1Ak,m)]=μk(1μk)σk2𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘2E[A_{k,m}(1-A_{k,m})]=\mu_{k}(1-\mu_{k})-\sigma_{k}^{2}, we obtain

Dk(μk(1μk)σk2)p0.superscript𝑝subscript𝐷𝑘subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘20D_{k}-(\mu_{k}(1-\mu_{k})-\sigma_{k}^{2})\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

We now turn our attention to Fksubscript𝐹𝑘F_{k}. We will first calculate the variance of ψk,msubscript𝜓𝑘𝑚\psi_{k,m}. Let Qk,msubscript𝑄𝑘𝑚Q_{k,m} be a binary variable that takes value one if cluster m𝑚m in population k𝑘k is sampled, and zero otherwise. Notice that

E[Rk,iWk,i(Wk,iAk,m)|Ak,m,Qk,m=1,mk,i=m]=pkAk,m(1Ak,m),𝐸delimited-[]formulae-sequenceconditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝐴𝑘𝑚subscript𝑄𝑘𝑚1subscript𝑚𝑘𝑖𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚E[R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})|A_{k,m},Q_{k,m}=1,m_{k,i}=m]=p_{k}A_{k,m}(1-A_{k,m}),

and

E[Rk,i(1Wk,i)(Wk,iAk,m)|Ak,m,Qk,m=1,mk,i=m]=pkAk,m(1Ak,m).𝐸delimited-[]formulae-sequenceconditionalsubscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝐴𝑘𝑚subscript𝑄𝑘𝑚1subscript𝑚𝑘𝑖𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚E[R_{k,i}(1-W_{k,i})(W_{k,i}-A_{k,m})|A_{k,m},Q_{k,m}=1,m_{k,i}=m]=-p_{k}A_{k,m}(1-A_{k,m}).

Consider now

ψk,m,1subscript𝜓𝑘𝑚1\displaystyle\psi_{k,m,1} =1nkpkqki=1nk1{mk,i=m}Rk,iWk,i(Wk,iAk,m)ek,i(1)absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖1\displaystyle=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})e_{k,i}(1)
=Qk,mnkpkqki=1nk1{mk,i=m}(Rk,iWk,i(Wk,iAk,m)pkAk,m(1Ak,m))ek,i(1),absentsubscript𝑄𝑘𝑚subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖1\displaystyle=\frac{Q_{k,m}}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\Big{(}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})-p_{k}A_{k,m}(1-A_{k,m})\Big{)}e_{k,i}(1),

and

ψk,m,0subscript𝜓𝑘𝑚0\displaystyle\psi_{k,m,0} =1nkpkqki=1nk1{mk,i=m}Rk,i(1Wk,i)(Wk,iAk,m)ek,i(0)absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖0\displaystyle=\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(1-W_{k,i})(W_{k,i}-A_{k,m})e_{k,i}(0)
=Qk,mnkpkqki=1nk1{mk,i=m}(Rk,i(1Wk,i)(Wk,iAk,m)+pkAk,m(1Ak,m))ek,i(0).absentsubscript𝑄𝑘𝑚subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖0\displaystyle=\frac{Q_{k,m}}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\Big{(}R_{k,i}(1-W_{k,i})(W_{k,i}-A_{k,m})+p_{k}A_{k,m}(1-A_{k,m})\Big{)}e_{k,i}(0).

It holds that ψk,m=ψk,m,1+ψk,m,0subscript𝜓𝑘𝑚subscript𝜓𝑘𝑚1subscript𝜓𝑘𝑚0\psi_{k,m}=\psi_{k,m,1}+\psi_{k,m,0} and E[ψk,m]=0𝐸delimited-[]subscript𝜓𝑘𝑚0E[\psi_{k,m}]=0. Now, notice that

E[ψk,m,12]𝐸delimited-[]superscriptsubscript𝜓𝑘𝑚12\displaystyle E[\psi_{k,m,1}^{2}] =1nkE[Ak,m(1Ak,m)2pkAk,m2(1Ak,m)2]i=1nk1{mk,i=m}ek,i2(1),absent1subscript𝑛𝑘𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘superscriptsubscript𝐴𝑘𝑚2superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖21\displaystyle=\frac{1}{n_{k}}E[A_{k,m}(1-A_{k,m})^{2}-p_{k}A_{k,m}^{2}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(1),
E[ψk,m,02]𝐸delimited-[]superscriptsubscript𝜓𝑘𝑚02\displaystyle E[\psi_{k,m,0}^{2}] =1nkE[Ak,m2(1Ak,m)pkAk,m2(1Ak,m)2]i=1nk1{mk,i=m}ek,i2(0),absent1subscript𝑛𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑝𝑘superscriptsubscript𝐴𝑘𝑚2superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖20\displaystyle=\frac{1}{n_{k}}E[A^{2}_{k,m}(1-A_{k,m})-p_{k}A_{k,m}^{2}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(0),
and
E[ψk,m,1ψk,m,0]𝐸delimited-[]subscript𝜓𝑘𝑚1subscript𝜓𝑘𝑚0\displaystyle E[\psi_{k,m,1}\psi_{k,m,0}] =1nkpkE[Ak,m2(1Ak,m)2]i=1nk1{mk,i=m}ek,i(1)ek,i(0).absent1subscript𝑛𝑘subscript𝑝𝑘𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚2superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖0\displaystyle=\frac{1}{n_{k}}p_{k}E[A_{k,m}^{2}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}(1)e_{k,i}(0).

Therefore,

E[(ψk,m,1+ψk,m,0)2]𝐸delimited-[]superscriptsubscript𝜓𝑘𝑚1subscript𝜓𝑘𝑚02\displaystyle E[(\psi_{k,m,1}+\psi_{k,m,0})^{2}] =1nkE[Ak,m(1Ak,m)2]i=1nk1{mk,i=m}ek,i2(1)absent1subscript𝑛𝑘𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖21\displaystyle=\frac{1}{n_{k}}E[A_{k,m}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(1)
+1nkE[Ak,m2(1Ak,m)]i=1nk1{mk,i=m}ek,i2(0)1subscript𝑛𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖20\displaystyle+\frac{1}{n_{k}}E[A^{2}_{k,m}(1-A_{k,m})]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(0)
1nkpkE[Ak,m2(1Ak,m)2]i=1nk1{mk,i=m}(ek,i(1)ek,i(0))2,1subscript𝑛𝑘subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-\frac{1}{n_{k}}p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(e_{k,i}(1)-e_{k,i}(0))^{2},

and

m=1mkE[(ψk,m,1+ψk,m,0)2]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜓𝑘𝑚1subscript𝜓𝑘𝑚02\displaystyle\sum_{m=1}^{m_{k}}E[(\psi_{k,m,1}+\psi_{k,m,0})^{2}] =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖21\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(1)
+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖20\displaystyle+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(0)
pkE[Ak,m2(1Ak,m)2]1nki=1nk(ek,i(1)ek,i(0))2.subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}. (A.8)

We will next show that the terms  ψk,msubscript ψ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m} do not matter for the asymptotic distribution of nkpkqk(τ^kτk)subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript^𝜏𝑘subscript𝜏𝑘\sqrt{n_{k}p_{k}q_{k}}(\widehat{\tau}_{k}-\tau_{k}). Notice that, because the cluster sum of ek,i(1)subscript𝑒𝑘𝑖1e_{k,i}(1) is equal to zero, we obtain E[ ψk,m]=0𝐸delimited-[]subscript ψ𝑘𝑚0E[\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m}]=0 and, therefore,

m=1mkE[ ψk,m]=0.superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscript ψ𝑘𝑚0\sum_{m=1}^{m_{k}}E\Big{[}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m}\Big{]}=0.

Moreover

2i=1nk1j=i+1nk1{mk,i=mk,j=m}ek,i(1)ek,j(1)=i=1nk1{mk,i=m}ek,i2(1)0.2superscriptsubscript𝑖1subscript𝑛𝑘1superscriptsubscript𝑗𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚subscript𝑒𝑘𝑖1subscript𝑒𝑘𝑗1superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖2102\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}e_{k,i}(1)e_{k,j}(1)=-\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(1)\leq 0.

In addition, E[Rk,iWk,i( Wk,mAk,m)2|mk,i=m]qkE[Ak,m(1Ak,m)]/nk,m𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚2subscript𝑚𝑘𝑖𝑚subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑛𝑘𝑚E[R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})^{2}|m_{k,i}=m]\leq q_{k}E[A_{k,m}(1-A_{k,m})]/n_{k,m} (see intermediate calculations). Therefore,

E[(\displaystyle E\Bigg{[}\Bigg{(} i=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m)ek,i(1))2]\displaystyle\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})e_{k,i}(1)\Bigg{)}^{2}\Bigg{]}
=i=1nk1{mk,i=m}E[Rk,iWk,i( Wk,mAk,m)2|mk,i=m]ek,i2(1)absentsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚2subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖21\displaystyle=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}E\Big{[}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})^{2}|m_{k,i}=m\Big{]}e_{k,i}^{2}(1)
+2i=1nk1j=i+1nkE[1{mk,i=mk,j=m}Rk,iRk,jWk,iWk,j( Wk,mAk,m)2]ek,i(1)ek,j(1)2superscriptsubscript𝑖1subscript𝑛𝑘1superscriptsubscript𝑗𝑖1subscript𝑛𝑘𝐸delimited-[]1subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚subscript𝑅𝑘𝑖subscript𝑅𝑘𝑗subscript𝑊𝑘𝑖subscript𝑊𝑘𝑗superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚2subscript𝑒𝑘𝑖1subscript𝑒𝑘𝑗1\displaystyle+2\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}E\Big{[}1\{m_{k,i}=m_{k,j}=m\}R_{k,i}R_{k,j}W_{k,i}W_{k,j}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})^{2}\Big{]}e_{k,i}(1)e_{k,j}(1)
qkE[Ak,m(1Ak,m)]1nk,mi=1nk1{mk,i=m}ek,i2(1).absentsubscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖21\displaystyle\leq q_{k}E[A_{k,m}(1-A_{k,m})]\frac{1}{n_{k,m}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(1).

Now, because errors are bounded, we obtain

m=1mkE[(1nkpkqki=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m)ek,i(1))2]cmknkpk.superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscript1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖12𝑐subscript𝑚𝑘subscript𝑛𝑘subscript𝑝𝑘\sum_{m=1}^{m_{k}}E\Bigg{[}\Bigg{(}\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})e_{k,i}(1)\Bigg{)}^{2}\Bigg{]}\leq c\,\frac{m_{k}}{n_{k}p_{k}}. (A.9)

Because mk/(nkpk)0subscript𝑚𝑘subscript𝑛𝑘subscript𝑝𝑘0m_{k}/(n_{k}p_{k})\rightarrow 0, the weak law of large numbers for arrays, implies,

1nkpkqkm=1mki=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m)ek,i(1)p0.superscript𝑝1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖10\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})e_{k,i}(1)\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

with the analogous result involving the errors ek,i(0)subscript𝑒𝑘𝑖0e_{k,i}(0). If follows that

m=1mk ψk,mp0.superscript𝑝superscriptsubscript𝑚1subscript𝑚𝑘subscript ψ𝑘𝑚0\sum_{m=1}^{m_{k}}\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Consider now φk,msubscript𝜑𝑘𝑚\varphi_{k,m}. Notice that

E[(Rk,iWk,i\displaystyle E\Big{[}\Big{(}R_{k,i}W_{k,i} (Wk,iAk,m)pkqkE[Ak,m(1Ak,m)]))2]\displaystyle(W_{k,i}-A_{k,m})-p_{k}q_{k}E[A_{k,m}(1-A_{k,m})])\Big{)}^{2}\Big{]}
=pkqkE[Ak,m(1Ak,m)2]pk2qk2(E[Ak,m(1Ak,m)])2,absentsubscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑝𝑘2superscriptsubscript𝑞𝑘2superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2\displaystyle=p_{k}q_{k}E[A_{k,m}(1-A_{k,m})^{2}]-p_{k}^{2}q_{k}^{2}\big{(}E[A_{k,m}(1-A_{k,m})]\big{)}^{2},

and

E[(Rk,iWk,i\displaystyle E\Big{[}\Big{(}R_{k,i}W_{k,i} (Wk,iAk,m)pkqkE[Ak,m(1Ak,m)]))\displaystyle(W_{k,i}-A_{k,m})-p_{k}q_{k}E[A_{k,m}(1-A_{k,m})])\Big{)}
×(Rk,jWk,j(Wk,jAk,m)pkqkE[Ak,m(1Ak,m)]))|mk,i=mk,j=m]\displaystyle\times\Big{(}R_{k,j}W_{k,j}(W_{k,j}-A_{k,m})-p_{k}q_{k}E[A_{k,m}(1-A_{k,m})])\Big{)}\big{|}m_{k,i}=m_{k,j}=m\Big{]}
=pk2qkE[Ak,m2(1Ak,m)2]pk2qk2(E[Ak,m(1Ak,m)])2.absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑝𝑘2superscriptsubscript𝑞𝑘2superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2\displaystyle\qquad=p_{k}^{2}q_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]-p_{k}^{2}q_{k}^{2}\big{(}E[A_{k,m}(1-A_{k,m})]\big{)}^{2}.

Therefore,

E[φk,m2]𝐸delimited-[]superscriptsubscript𝜑𝑘𝑚2\displaystyle E[\varphi_{k,m}^{2}] =(E[Ak,m(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)nk,mnk(τk,mτk)2absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle=\Big{(}E[A_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+(pkE[Ak,m2(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)nk,m(nk,m1)nk(τk,mτk)2,subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+\Big{(}p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\frac{n_{k,m}(n_{k,m}-1)}{n_{k}}(\tau_{k,m}-\tau_{k})^{2},

and

m=1mkE[φk,m2]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜑𝑘𝑚2\displaystyle\sum_{m=1}^{m_{k}}E[\varphi_{k,m}^{2}] =(E[Ak,m(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)m=1mknk,mnk(τk,mτk)2absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle=\Big{(}E[A_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+(pkE[Ak,m2(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)m=1mknk,m(nk,m1)nk(τk,mτk)2.subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+\Big{(}p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}(n_{k,m}-1)}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

Next, we calculate the variance of  φk,msubscript φ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m}. Using results on the moments of a Binomial distribution, we obtain, for n1𝑛1n\geq 1,

E[(i=1nk\displaystyle E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}} 1{mk,i=m}Rk,iWk,i( Wk,mAk,m))2|Qk,m=1, Nk,m=n]\displaystyle 1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{)}^{2}\Big{|}\begin{array}[]{l}Q_{k,m}=1,\\ \hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\end{array}\Bigg{]}
=1n2E[(i=1nk1{mk,i=m}Rk,iWk,i(i=1nk1{mk,i=m}Rk,iWk,inAk,m))2|Qk,m=1, Nk,m=n]absent1superscript𝑛2𝐸delimited-[]conditionalsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖𝑛subscript𝐴𝑘𝑚2subscript𝑄𝑘𝑚1subscript N𝑘𝑚𝑛\displaystyle=\frac{1}{n^{2}}E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}\Big{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}-nA_{k,m}\Big{)}\Bigg{)}^{2}\Big{|}\begin{array}[]{l}Q_{k,m}=1,\\ \hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\end{array}\Bigg{]}
=nE[Ak,m3(1Ak,m)]+E[Ak,m2(1Ak,m)(57Ak,m)]absent𝑛𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚31subscript𝐴𝑘𝑚𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚57subscript𝐴𝑘𝑚\displaystyle=nE[A_{k,m}^{3}(1-A_{k,m})]+E[A^{2}_{k,m}(1-A_{k,m})(5-7A_{k,m})]
+1nE[Ak,m(1Ak,m)(6Ak,m26Ak,m+1)].1𝑛𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚6subscriptsuperscript𝐴2𝑘𝑚6subscript𝐴𝑘𝑚1\displaystyle+\frac{1}{n}E[A_{k,m}(1-A_{k,m})(6A^{2}_{k,m}-6A_{k,m}+1)].

Therefore,

E[(i=1nk\displaystyle E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}} 1{mk,i=m}Rk,iWk,i( Wk,mAk,m))2]\displaystyle 1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{)}^{2}\Bigg{]}
=nk,mpkqkE[Ak,m3(1Ak,m)]+qkE[Ak,m2(1Ak,m)(57Ak,m)](1(1pk)nk,m)absentsubscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚31subscript𝐴𝑘𝑚subscript𝑞𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚57subscript𝐴𝑘𝑚1superscript1subscript𝑝𝑘subscript𝑛𝑘𝑚\displaystyle=n_{k,m}p_{k}q_{k}E[A_{k,m}^{3}(1-A_{k,m})]+q_{k}E[A^{2}_{k,m}(1-A_{k,m})(5-7A_{k,m})](1-(1-p_{k})^{n_{k,m}})
+qkE[Ak,m(1Ak,m)(6Ak,m26Ak,m+1)]rk,m,subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚6subscriptsuperscript𝐴2𝑘𝑚6subscript𝐴𝑘𝑚1subscript𝑟𝑘𝑚\displaystyle+q_{k}E[A_{k,m}(1-A_{k,m})(6A^{2}_{k,m}-6A_{k,m}+1)]r_{k,m},

where

rk,m=n=1nk,m1nPr( Nk,m=n|Qk,m=1)n=1nk,mPr( Nk,m=n|Qk,m=1)1.subscript𝑟𝑘𝑚superscriptsubscript𝑛1subscript𝑛𝑘𝑚1𝑛Prsubscript N𝑘𝑚conditional𝑛subscript𝑄𝑘𝑚1superscriptsubscript𝑛1subscript𝑛𝑘𝑚Prsubscript N𝑘𝑚conditional𝑛subscript𝑄𝑘𝑚11r_{k,m}=\sum_{n=1}^{n_{k,m}}\frac{1}{n}\Pr(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n|Q_{k,m}=1)\leq\sum_{n=1}^{n_{k,m}}\Pr(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n|Q_{k,m}=1)\leq 1.

It follows that,

E[ φk,m2]𝐸delimited-[]subscriptsuperscript φ2𝑘𝑚\displaystyle E[\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}^{2}_{k,m}] =(τk,mτk)2(nk,mnkE[Ak,m3(1Ak,m)]+1nkpkE[Ak,m2(1Ak,m)(57Ak,m)](1(1pk)nk,m)\displaystyle=(\tau_{k,m}-\tau_{k})^{2}\Big{(}\frac{n_{k,m}}{n_{k}}E[A_{k,m}^{3}(1-A_{k,m})]+\frac{1}{n_{k}p_{k}}E[A^{2}_{k,m}(1-A_{k,m})(5-7A_{k,m})](1-(1-p_{k})^{n_{k,m}})
+1nkpkE[Ak,m(1Ak,m)(6Ak,m26Ak,m+1)]rk,m1subscript𝑛𝑘subscript𝑝𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚6subscriptsuperscript𝐴2𝑘𝑚6subscript𝐴𝑘𝑚1subscript𝑟𝑘𝑚\displaystyle+\frac{1}{n_{k}p_{k}}E[A_{k,m}(1-A_{k,m})(6A^{2}_{k,m}-6A_{k,m}+1)]r_{k,m}
qknkpk(E[Ak,m(1Ak,m)])2(1(1pk)nk,m)2).\displaystyle-\frac{q_{k}}{n_{k}p_{k}}(E[A_{k,m}(1-A_{k,m})])^{2}(1-(1-p_{k})^{n_{k,m}})^{2}\Big{)}.

Therefore,

m=1mkE[ φk,m2]=m=1mk(τk,mτk)2(nk,mnk)E[Ak,m3(1Ak,m)]+\scaleto𝒪5pt(1).superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscriptsuperscript φ2𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2subscript𝑛𝑘𝑚subscript𝑛𝑘𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚31subscript𝐴𝑘𝑚\scaleto𝒪5𝑝𝑡1\sum_{m=1}^{m_{k}}E[\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}^{2}_{k,m}]=\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})^{2}\Big{(}\frac{n_{k,m}}{n_{k}}\Big{)}E[A_{k,m}^{3}(1-A_{k,m})]+\scaleto{\mathcal{O}}{5pt}(1).

We will now study the covariance between φk,msubscript𝜑𝑘𝑚\varphi_{k,m} and  φk,msubscript φ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m}. Using results on the moments of a Binomial distribution, we obtain, for n1𝑛1n\geq 1,

E[(\displaystyle E\Bigg{[}\Bigg{(} i=1nk1{mk,i=m}Rk,iWk,i(Wk,iAk,m))(i=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m))|Qk,m=1, Nk,m=n]\displaystyle\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})\Bigg{)}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{)}\Big{|}\begin{array}[]{l}Q_{k,m}=1,\\ \hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\end{array}\Bigg{]}
=E[1Ak,mn(i=1nk1{mk,i=m}Rk,iWk,i)2(i=1nk1{mk,i=m}Rk,iWk,inAk,m))|Qk,m=1, Nk,m=n]\displaystyle=E\Bigg{[}\frac{1-A_{k,m}}{n}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}\Bigg{)}^{2}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}-nA_{k,m})\Bigg{)}\Big{|}\begin{array}[]{l}Q_{k,m}=1,\\ \hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\end{array}\Bigg{]}
=2nE[Ak,m2(1Ak,m)2]+E[Ak,m(1Ak,m)2(12Ak,m)].absent2𝑛𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚212subscript𝐴𝑘𝑚\displaystyle=2nE[A^{2}_{k,m}(1-A_{k,m})^{2}]+E[A_{k,m}(1-A_{k,m})^{2}(1-2A_{k,m})].

Therefore,

E[(\displaystyle E\Bigg{[}\Bigg{(} i=1nk1{mk,i=m}Rk,iWk,i(Wk,iAk,m))(i=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m))]\displaystyle\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})\Bigg{)}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{)}\Bigg{]}
=2nk,mpkqkE[Ak,m2(1Ak,m)2]+qkE[Ak,m(1Ak,m)2(12Ak,m)]Pr( Nk,m1|Qk,m=1).absent2subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚212subscript𝐴𝑘𝑚Prsubscript N𝑘𝑚conditional1subscript𝑄𝑘𝑚1\displaystyle=2n_{k,m}p_{k}q_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]+q_{k}E[A_{k,m}(1-A_{k,m})^{2}(1-2A_{k,m})]\Pr(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\geq 1|Q_{k,m}=1).

In addition,

E[\displaystyle E\Bigg{[} i=1nk1{mk,i=m}Rk,iWk,i(Wk,iAk,m)]E[i=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m)]\displaystyle\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})\Bigg{]}E\Bigg{[}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{]}
=nk,mpkqk2(E[Ak,m(1Ak,m)])2Pr( Nk,m1|Qk,m=1).absentsubscript𝑛𝑘𝑚subscript𝑝𝑘subscriptsuperscript𝑞2𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2Prsubscript N𝑘𝑚conditional1subscript𝑄𝑘𝑚1\displaystyle=n_{k,m}p_{k}q^{2}_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Pr(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\geq 1|Q_{k,m}=1).

As a result,

E[φk,m φk,m]𝐸delimited-[]subscript𝜑𝑘𝑚subscript φ𝑘𝑚\displaystyle E[\varphi_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m}] =(2E[Ak,m2(1Ak,m)2]qk(E[Ak,m(1Ak,m)])2)(τk,mτk)2(nk,mnk)+𝒪(1nkpk)absent2𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2subscript𝑛𝑘𝑚subscript𝑛𝑘𝒪1subscript𝑛𝑘subscript𝑝𝑘\displaystyle=\Big{(}2E[A^{2}_{k,m}(1-A_{k,m})^{2}]-q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}(\tau_{k,m}-\tau_{k})^{2}\left(\frac{n_{k,m}}{n_{k}}\right)+\mathcal{O}\left(\frac{1}{n_{k}p_{k}}\right)
+𝒪(qkpkminmnk,m(pkminmnk,m(1pk)minmnk,m)).𝒪subscript𝑞𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚superscript1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚\displaystyle+\mathcal{O}\left(\frac{q_{k}}{p_{k}\min_{m}n_{k,m}}\big{(}p_{k}\min_{m}n_{k,m}(1-p_{k})^{\min_{m}n_{k,m}}\big{)}\right).

Notice that mk/(nkpk)0subscript𝑚𝑘subscript𝑛𝑘subscript𝑝𝑘0m_{k}/(n_{k}p_{k})\rightarrow 0. In addition, mkqk/(pkminmnk,m)0subscript𝑚𝑘subscript𝑞𝑘subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0m_{k}q_{k}/(p_{k}\min_{m}n_{k,m})\rightarrow 0 and

pkminmnk,m(1pk)minmnk,msubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚superscript1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚\displaystyle p_{k}\min_{m}n_{k,m}(1-p_{k})^{\min_{m}n_{k,m}} =pkminmnk,m(1pkminmnk,mminmnk,m)minmnk,mabsentsubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚superscript1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚\displaystyle=p_{k}\min_{m}n_{k,m}\left(1-\frac{p_{k}\min_{m}n_{k,m}}{\min_{m}n_{k,m}}\right)^{\min_{m}n_{k,m}}
<pkminmnk,mepkminmnk,m0.absentsubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚superscript𝑒subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0\displaystyle<p_{k}\min_{m}n_{k,m}e^{-p_{k}\min_{m}n_{k,m}}\longrightarrow 0.

Therefore,

m=1mkE[φk,m φk,m]=(2E[Ak,m2(1Ak,m)2]qk(E[Ak,m(1Ak,m)])2)m=1mk(τk,mτk)2(nk,mnk)+\scaleto𝒪5pt(1).superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscript𝜑𝑘𝑚subscript φ𝑘𝑚2𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2subscript𝑛𝑘𝑚subscript𝑛𝑘\scaleto𝒪5𝑝𝑡1\sum_{m=1}^{m_{k}}E[\varphi_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m}]=\Big{(}2E[A^{2}_{k,m}(1-A_{k,m})^{2}]-q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})^{2}\left(\frac{n_{k,m}}{n_{k}}\right)+\ \scaleto{\mathcal{O}}{5pt}(1).

Next, we will study the remaining covariances between ψk,msubscript𝜓𝑘𝑚\psi_{k,m}, φk,msubscript𝜑𝑘𝑚\varphi_{k,m},  ψk,msubscript ψ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m}, and  φk,msubscript φ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m}. Because the intra-cluster errors, ek,i(1)subscript𝑒𝑘𝑖1e_{k,i}(1) and ek,i(0)subscript𝑒𝑘𝑖0e_{k,i}(0) sum to zero, it can be easily seen that E[ψk,mφk,m]=E[ψk,m φk,m]=0𝐸delimited-[]subscript𝜓𝑘𝑚subscript𝜑𝑘𝑚𝐸delimited-[]subscript𝜓𝑘𝑚subscript φ𝑘𝑚0E[\psi_{k,m}\varphi_{k,m}]=E[\psi_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m}]=0. It can also be seen that the inter-clusters sums of covariances between  ψk,msubscript ψ𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m} and any of the other terms go to zero. To prove this for the covariance with ψk,msubscript𝜓𝑘𝑚\psi_{k,m}, we have

(m=1mkE[|ψk,m ψk,m|])2superscriptsuperscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscript𝜓𝑘𝑚subscript ψ𝑘𝑚2\displaystyle\left(\sum_{m=1}^{m_{k}}E[|\psi_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m}|]\right)^{2} (m=1mk(E[ψk,m2]E[ ψk,m2])1/2)2absentsuperscriptsuperscriptsubscript𝑚1subscript𝑚𝑘superscript𝐸delimited-[]subscriptsuperscript𝜓2𝑘𝑚𝐸delimited-[]subscriptsuperscript ψ2𝑘𝑚122\displaystyle\leq\left(\sum_{m=1}^{m_{k}}(E[\psi^{2}_{k,m}]E[\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}^{2}_{k,m}])^{1/2}\right)^{2}
m=1mkE[ψk,m2]m=1mkE[ ψk,m2]absentsuperscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscriptsuperscript𝜓2𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscriptsuperscript ψ2𝑘𝑚\displaystyle\leq\sum_{m=1}^{m_{k}}E[\psi^{2}_{k,m}]\sum_{m=1}^{m_{k}}E[\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}^{2}_{k,m}]
=𝒪(1)\scaleto𝒪5pt(1)=\scaleto𝒪5pt(1).absent𝒪1\scaleto𝒪5𝑝𝑡1\scaleto𝒪5𝑝𝑡1\displaystyle=\mathcal{O}(1)\scaleto{\mathcal{O}}{5pt}(1)=\scaleto{\mathcal{O}}{5pt}(1).

The same argument and result applies to E[ ψk,mφk,m]𝐸delimited-[]subscript ψ𝑘𝑚subscript𝜑𝑘𝑚E[\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m}\varphi_{k,m}] and E[ ψk,m φk,m]𝐸delimited-[]subscript ψ𝑘𝑚subscript φ𝑘𝑚E[\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m}]. Putting all the pieces together, we obtain

nkpkqkE[Dk2(τ^kfixedτk)2]=fk+\scaleto𝒪5pt(1),subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]superscriptsubscript𝐷𝑘2superscriptsuperscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘2subscript𝑓𝑘\scaleto𝒪5𝑝𝑡1\displaystyle n_{k}p_{k}q_{k}E[D_{k}^{2}(\widehat{\tau}_{k}^{\rm{\,fixed}}-\tau_{k})^{2}]=f_{k}+\scaleto{\mathcal{O}}{5pt}(1),

where

fksubscript𝑓𝑘\displaystyle f_{k} =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖21𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖20\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(1)+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(0)
pkE[Ak,m2(1Ak,m)2]1nki=1nk(ek,i(1)ek,i(0))2subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}
+(E[Ak,m(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)m=1mknk,mnk(τk,mτk)2𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+\Big{(}E[A_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+(pkE[Ak,m2(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)m=1mknk,m(nk,m1)nk(τk,mτk)2subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+\Big{(}p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}(n_{k,m}-1)}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+E[Ak,m3(1Ak,m)]m=1mk(τk,mτk)2(nk,mnk)𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚31subscript𝐴𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2subscript𝑛𝑘𝑚subscript𝑛𝑘\displaystyle+E[A_{k,m}^{3}(1-A_{k,m})]\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})^{2}\Big{(}\frac{n_{k,m}}{n_{k}}\Big{)}
2(2E[Ak,m2(1Ak,m)2]qk(E[Ak,m(1Ak,m)])2)m=1mk(τk,mτk)2(nk,mnk).22𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2subscript𝑛𝑘𝑚subscript𝑛𝑘\displaystyle-2\Big{(}2E[A^{2}_{k,m}(1-A_{k,m})^{2}]-q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})^{2}\left(\frac{n_{k,m}}{n_{k}}\right).

Collecting terms with identical factors, we obtain

fksubscript𝑓𝑘\displaystyle f_{k} =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖21𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖20\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(1)+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(0)
pkE[Ak,m2(1Ak,m)2]1nki=1nk(ek,i(1)ek,i(0))2subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}
+(E[Ak,m(1Ak,m)2](4+pk)E[Ak,m2(1Ak,m)2]\displaystyle+\Big{(}E[A_{k,m}(1-A_{k,m})^{2}]-(4+p_{k})E[A^{2}_{k,m}(1-A_{k,m})^{2}]
+E[Ak,m3(1Ak,m)]+2qk(E[Ak,m(1Ak,m)])2)m=1mknk,mnk(τk,mτk)2\displaystyle\qquad\qquad+E[A_{k,m}^{3}(1-A_{k,m})]+2q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+(pkE[Ak,m2(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)m=1mknk,m2nk(τk,mτk)2.subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝑛2𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+\Big{(}p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n^{2}_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

The first three terms in the expression above depend on intra-cluster heterogeneity in potential outcomes and treatment effects. The last two terms depend on inter-cluster variation in average treatment effects.

A more compact expression for fksubscript𝑓𝑘f_{k} is

fksubscript𝑓𝑘\displaystyle f_{k} =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖21𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖20\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(1)+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{2}(0)
pkE[Ak,m2(1Ak,m)2]1nki=1nk(ek,i(1)ek,i(0))2subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}
+(E[Ak,m(1Ak,m)](5+pk)E[Ak,m2(1Ak,m)2]\displaystyle+\Big{(}E[A_{k,m}(1-A_{k,m})]-(5+p_{k})E[A^{2}_{k,m}(1-A_{k,m})^{2}]
+2qk(E[Ak,m(1Ak,m)])2)m=1mknk,mnk(τk,mτk)2\displaystyle\qquad\qquad+2q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+(pkE[Ak,m2(1Ak,m)2]pkqk(E[Ak,m(1Ak,m)])2)m=1mknk,m2nk(τk,mτk)2.subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝑛2𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+\Big{(}p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]-p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\Big{)}\sum_{m=1}^{m_{k}}\frac{n^{2}_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}. (A.10)

Notice that the first four terms in (A.3.1) are bounded, and that

E[Ak,m2(1Ak,m)2]qk(E[Ak,m(1Ak,m)])2=var(Ak,m(1Ak,m))+(1qk)(E[Ak,m(1Ak,m)])2.𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2varsubscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2E[A^{2}_{k,m}(1-A_{k,m})^{2}]-q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}=\mbox{var}(A_{k,m}(1-A_{k,m}))+(1-q_{k})(E[A_{k,m}(1-A_{k,m})])^{2}.

Assume that

lim infkm=1mknk,mnk(τk,mτk)2>0,subscriptlimit-infimum𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘20\liminf_{k\rightarrow\infty}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}>0, (A.11)

and

lim infkvar(Ak,m(1Ak,m))(1qk)>0.subscriptlimit-infimum𝑘varsubscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑞𝑘0\liminf_{k\rightarrow\infty}\mbox{var}(A_{k,m}(1-A_{k,m}))\vee(1-q_{k})>0. (A.12)

The last term in equation (A.3.1) is greater than

pkminmnk,m(E[Ak,m2(1Ak,m)2](E[Ak,m(1Ak,m)])2)m=1mknk,mnk(τk,mτk)2,subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2p_{k}\min_{m}n_{k,m}\big{(}E[A^{2}_{k,m}(1-A_{k,m})^{2}]-(E[A_{k,m}(1-A_{k,m})])^{2}\big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2},

which converges to infinity because pkminmnk,msubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚p_{k}\min_{m}n_{k,m}\rightarrow\infty. That is, the last term dominates the variance in large samples provided that (A.11) and (A.12) hold.

We will now derive the large sample distribution of τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\widehat{\tau}_{k}^{\rm{\,fixed}}. To show that Lyapunov’s condition holds for Fksubscript𝐹𝑘F_{k}, notice that

|(\displaystyle|( ψk,m ψk,m)+(φk,m φk,m)|3\displaystyle\psi_{k,m}-\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m})+(\varphi_{k,m}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m})|^{3}
=1(nkpkqk)3/2|i=1nk1{mk,i=m}Rk,i((ek,i(1)+τk,mτk)Wk,i+ek,i(0)(1Wk,i))(Wk,i Wk,m)absentconditional1superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘32superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑒𝑘𝑖1subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=\frac{1}{(n_{k}p_{k}q_{k})^{3/2}}\Bigg{|}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}((e_{k,i}(1)+\tau_{k,m}-\tau_{k})W_{k,i}+e_{k,i}(0)(1-W_{k,i}))(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})
(τk,mτ)qkE[Ak,m(1Ak,m)](1(1pk)nk,m)|3,\displaystyle\qquad\qquad\qquad\qquad\qquad-(\tau_{k,m}-\tau)q_{k}E[A_{k,m}(1-A_{k,m})](1-(1-p_{k})^{n_{k,m}})\Bigg{|}^{3},

where the last term inside the absolute value comes from the bias correction. Notice that,

|i=1nk1{mk,i=m}\displaystyle\Bigg{|}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\} Rk,i(ek,i(1)+τk,mτk)Wk,i(Wk,i Wk,m)|3\displaystyle R_{k,i}(e_{k,i}(1)+\tau_{k,m}-\tau_{k})W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{|}^{3}
=|(1 Wk,m)i=1nk1{mk,i=m}Rk,i(ek,i(1)+τk,mτk)Wk,i|3absentsuperscript1subscript W𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑒𝑘𝑖1subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝑊𝑘𝑖3\displaystyle=\Bigg{|}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(e_{k,i}(1)+\tau_{k,m}-\tau_{k})W_{k,i}\Bigg{|}^{3}
c|i=1nk1{mk,i=m}Rk,iWk,i|3absent𝑐superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖3\displaystyle\leq c\,\Bigg{|}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}\Bigg{|}^{3}
c Nk,m3.absent𝑐superscriptsubscript N𝑘𝑚3\displaystyle\leq c\,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}^{3}.

From the formula of the third moment of a binomial random variable, we obtain

E[ Nk,m3]𝐸delimited-[]superscriptsubscript N𝑘𝑚3\displaystyle E[\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}^{3}] =qkE[ Nk,m3|Qk,m=1]absentsubscript𝑞𝑘𝐸delimited-[]conditionalsuperscriptsubscript N𝑘𝑚3subscript𝑄𝑘𝑚1\displaystyle=q_{k}E[\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}^{3}|Q_{k,m}=1]
=nk,m3pk3qk+\scaleto𝒪5pt(nk,m3pk3qk),absentsuperscriptsubscript𝑛𝑘𝑚3superscriptsubscript𝑝𝑘3subscript𝑞𝑘\scaleto𝒪5𝑝𝑡superscriptsubscript𝑛𝑘𝑚3superscriptsubscript𝑝𝑘3subscript𝑞𝑘\displaystyle=n_{k,m}^{3}p_{k}^{3}q_{k}+\scaleto{\mathcal{O}}{5pt}(n_{k,m}^{3}p_{k}^{3}q_{k}),

as pknk,msubscript𝑝𝑘subscript𝑛𝑘𝑚p_{k}n_{k,m}\rightarrow\infty. Now,

1fk3/2k=1mkE[|\displaystyle\frac{1}{f_{k}^{3/2}}\sum_{k=1}^{m_{k}}E\Bigg{[}\Bigg{|} 1nkpkqki=1nk1{mk,i=m}Rk,i(ek,i(1)+τk,mτk)Wk,i(Wk,i Wk,m)|3]\displaystyle\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(e_{k,i}(1)+\tau_{k,m}-\tau_{k})W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{|}^{3}\Bigg{]}
cnkmaxnnk,m2pk3qk(nkpkqk)3/2(pkminmnk,m)3/2=c(maxmnk,mminmnk,m)21(mkqk)1/20.absent𝑐subscript𝑛𝑘subscript𝑛superscriptsubscript𝑛𝑘𝑚2subscriptsuperscript𝑝3𝑘subscript𝑞𝑘superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘32superscriptsubscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚32𝑐superscriptsubscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚21superscriptsubscript𝑚𝑘subscript𝑞𝑘120\displaystyle\leq c\,\frac{n_{k}\max_{n}n_{k,m}^{2}p^{3}_{k}q_{k}}{(n_{k}p_{k}q_{k})^{3/2}(p_{k}\min_{m}n_{k,m})^{3/2}}=c\left(\frac{\max_{m}n_{k,m}}{\min_{m}n_{k,m}}\right)^{2}\frac{1}{(m_{k}q_{k})^{1/2}}\longrightarrow 0.

Similar calculations deliver the analogous result for the term involving ek,i(0)subscript𝑒𝑘𝑖0e_{k,i}(0), and proving the result for the bias term is straightforward. Therefore, we obtain

1fk3/2m=1mk|(ψk,m ψk,m)+(φk,m φk,m)|30.1superscriptsubscript𝑓𝑘32superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝜓𝑘𝑚subscript ψ𝑘𝑚subscript𝜑𝑘𝑚subscript φ𝑘𝑚30\frac{1}{f_{k}^{3/2}}\sum_{m=1}^{m_{k}}|(\psi_{k,m}-\hbox{\vbox{\hrule height=0.5pt\kern 0.77498pt\hbox{\kern-0.18004pt$\psi$\kern-0.18004pt}}}_{k,m})+(\varphi_{k,m}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\varphi$\kern-0.18004pt}}}_{k,m})|^{3}\longrightarrow 0.

By the Central Limit Theorem for arrays, this implies

nkpkqkFk/fk1/2dN(0,1).superscript𝑑subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝐹𝑘superscriptsubscript𝑓𝑘12𝑁01\sqrt{n_{k}p_{k}q_{k}}F_{k}/f_{k}^{1/2}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).

Let v~k=fk/(μk(1μk)σk2)2subscript~𝑣𝑘subscript𝑓𝑘superscriptsubscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘2\tilde{v}_{k}=f_{k}/(\mu_{k}(1-\mu_{k})-\sigma^{2}_{k})^{2}. Then,

nkpkqk(τ^kfixedτk)/v~k1/2dN(0,1).superscript𝑑subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘superscriptsubscript~𝑣𝑘12𝑁01\sqrt{n_{k}p_{k}q_{k}}(\widehat{\tau}_{k}^{\rm{\,fixed}}-\tau_{k})/\tilde{v}_{k}^{1/2}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).

As a result,

Nk(τ^kfixedτk)/v~k1/2dN(0,1).superscript𝑑subscript𝑁𝑘superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘superscriptsubscript~𝑣𝑘12𝑁01\sqrt{N_{k}}(\widehat{\tau}_{k}^{\rm{\,fixed}}-\tau_{k})/\tilde{v}_{k}^{1/2}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).

A.3.2 Estimation of the variance

Let

Nk,m,0subscript𝑁𝑘𝑚0\displaystyle N_{k,m,0} =i=1nk1{mk,i=m}Rk,i(1Wk,i)absentsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖\displaystyle=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(1-W_{k,i})
and
Nk,m,1subscript𝑁𝑘𝑚1\displaystyle N_{k,m,1} =i=1nk1{mk,i=m}Rk,iWk,i.absentsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖\displaystyle=\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}.

Let

 Yk,m=1 Nk,m1i=1nk1{mk,i=m}Rk,iYk,i.subscript Y𝑘𝑚1subscript N𝑘𝑚1superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑌𝑘𝑖\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$Y$\kern-0.18004pt}}}_{k,m}=\frac{1}{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\vee 1}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}Y_{k,i}.

Then,

 Yk,m=α^k,m+τ^k,m Wk,m,subscript Y𝑘𝑚subscript^𝛼𝑘𝑚subscript^𝜏𝑘𝑚subscript W𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$Y$\kern-0.18004pt}}}_{k,m}=\widehat{\alpha}_{k,m}+\widehat{\tau}_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m},

where

α^k,m=1Nk,m,01i=1nk1{mk,i=m}Rk,i(1Wk,i)Yk,i,subscript^𝛼𝑘𝑚1subscript𝑁𝑘𝑚01superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑌𝑘𝑖\displaystyle\widehat{\alpha}_{k,m}=\frac{1}{N_{k,m,0}\vee 1}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(1-W_{k,i})Y_{k,i},
τ^k,m=1Nk,m,11i=1nk1{mk,i=m}Rk,iWk,iYk,i1Nk,m,01i=1nk1{mk,i=m}Rk,i(1Wk,i)Yk,i,subscript^𝜏𝑘𝑚1subscript𝑁𝑘𝑚11superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑌𝑘𝑖1subscript𝑁𝑘𝑚01superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑌𝑘𝑖\displaystyle\widehat{\tau}_{k,m}=\frac{1}{N_{k,m,1}\vee 1}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}Y_{k,i}-\frac{1}{N_{k,m,0}\vee 1}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(1-W_{k,i})Y_{k,i},
and, as before,
 Wk,m=1 Nk,m1i=1nk1{mk,i=m}Rk,iWk,i.subscript W𝑘𝑚1subscript N𝑘𝑚1superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖\displaystyle\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}=\frac{1}{\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\vee 1}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}.

Let U~k,i=Y~k,iτ^kfixedW~k,isubscript~𝑈𝑘𝑖subscript~𝑌𝑘𝑖superscriptsubscript^𝜏𝑘fixedsubscript~𝑊𝑘𝑖\widetilde{U}_{k,i}=\widetilde{Y}_{k,i}-\widehat{\tau}_{k}^{\,{\rm fixed}}\widetilde{W}_{k,i}, where Y~k,i=Yk,i Yk,mk,isubscript~𝑌𝑘𝑖subscript𝑌𝑘𝑖subscript Y𝑘subscript𝑚𝑘𝑖\widetilde{Y}_{k,i}=Y_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$Y$\kern-0.18004pt}}}_{k,m_{k,i}}, W~k,i=(Wk,i Wk,mk,i)subscript~𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘subscript𝑚𝑘𝑖\widetilde{W}_{k,i}=(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m_{k,i}}), and τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\widehat{\tau}_{k}^{\,\rm fixed} is the within estimator of τksubscript𝜏𝑘\tau_{k}. Let Σ~k=m=1mkΣ~k,msubscript~Σ𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript~Σ𝑘𝑚\widetilde{\Sigma}_{k}=\sum_{m=1}^{m_{k}}\widetilde{\Sigma}_{k,m}, where

Σ~k,msubscript~Σ𝑘𝑚\displaystyle\widetilde{\Sigma}_{k,m} =(i=1nk1{mk,i=m}Rk,iW~k,iU~k,i)2.absentsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript~𝑊𝑘𝑖subscript~𝑈𝑘𝑖2\displaystyle=\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}\widetilde{U}_{k,i}\right)^{2}.

Also, let

Q~k=i=1nkRk,iW~k,i2.subscript~𝑄𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2\widetilde{Q}_{k}=\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}.

Then, the cluster estimator of the variance of Nk(τ^kfixedτk)subscript𝑁𝑘subscriptsuperscript^𝜏fixed𝑘subscript𝜏𝑘\sqrt{N_{k}}(\widehat{\tau}^{\,{\rm fixed}}_{k}-\tau_{k}) is

V~kcluster=NkQ~k1Σ~kQ~k1.superscriptsubscript~𝑉𝑘clustersubscript𝑁𝑘superscriptsubscript~𝑄𝑘1subscript~Σ𝑘superscriptsubscript~𝑄𝑘1\displaystyle\widetilde{V}_{k}^{\rm{cluster}}=N_{k}\widetilde{Q}_{k}^{-1}\widetilde{\Sigma}_{k}\widetilde{Q}_{k}^{-1}.

We know already that

1nkpkqkQ~k(μk(1μk)σk2)p0,superscript𝑝1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript~𝑄𝑘subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘20\frac{1}{n_{k}p_{k}q_{k}}\widetilde{Q}_{k}-(\mu_{k}(1-\mu_{k})-\sigma_{k}^{2})\stackrel{{\scriptstyle p}}{{\longrightarrow}}0,

with μk(1μk)σk2subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘2\mu_{k}(1-\mu_{k})-\sigma_{k}^{2} bounded away from zero. To establish convergence of Σ~k/(nkpkqkfk)subscript~Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑓𝑘\widetilde{\Sigma}_{k}/(n_{k}p_{k}q_{k}f_{k}), first notice that, for mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m, we have

U~k,isubscript~𝑈𝑘𝑖\displaystyle\widetilde{U}_{k,i} =Yk,i(α^k,m+τ^k,m Wk,m)τ^kfixed(Wk,i Wk,m)absentsubscript𝑌𝑘𝑖subscript^𝛼𝑘𝑚subscript^𝜏𝑘𝑚subscript W𝑘𝑚superscriptsubscript^𝜏𝑘fixedsubscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=Y_{k,i}-(\widehat{\alpha}_{k,m}+\widehat{\tau}_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})-\widehat{\tau}_{k}^{\,{\rm fixed}}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})
=yk,i(1)Wk,i+yk,i(0)(1Wk,i)(αk,m+τk,m Wk,m)τ^kfixed(Wk,i Wk,m)absentsubscript𝑦𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑦𝑘𝑖01subscript𝑊𝑘𝑖subscript𝛼𝑘𝑚subscript𝜏𝑘𝑚subscript W𝑘𝑚superscriptsubscript^𝜏𝑘fixedsubscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=y_{k,i}(1)W_{k,i}+y_{k,i}(0)(1-W_{k,i})-(\alpha_{k,m}+\tau_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})-\widehat{\tau}_{k}^{\,{\rm fixed}}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})
(α^k,mαk,m)(τ^k,mτk,m) Wk,msubscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚subscript^𝜏𝑘𝑚subscript𝜏𝑘𝑚subscript W𝑘𝑚\displaystyle-(\widehat{\alpha}_{k,m}-\alpha_{k,m})-(\widehat{\tau}_{k,m}-\tau_{k,m})\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}
=ek,i(1)Wk,i+ek,i(0)(1Wk,i)+(τk,mτ^kfixed)(Wk,i Wk,m)absentsubscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝜏𝑘𝑚superscriptsubscript^𝜏𝑘fixedsubscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})+(\tau_{k,m}-\widehat{\tau}_{k}^{\,{\rm fixed}})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})
(α^k,mαk,m)(τ^k,mτk,m) Wk,msubscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚subscript^𝜏𝑘𝑚subscript𝜏𝑘𝑚subscript W𝑘𝑚\displaystyle-(\widehat{\alpha}_{k,m}-\alpha_{k,m})-(\widehat{\tau}_{k,m}-\tau_{k,m})\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}
=ek,i(1)Wk,i+ek,i(0)(1Wk,i)+(τk,mτk)(Wk,i Wk,m)absentsubscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})+(\tau_{k,m}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})
(τ^kfixedτk)(Wk,i Wk,m)(α^k,mαk,m)(τ^k,mτk,m) Wk,m.superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚subscript^𝜏𝑘𝑚subscript𝜏𝑘𝑚subscript W𝑘𝑚\displaystyle-(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})-(\widehat{\alpha}_{k,m}-\alpha_{k,m})-(\widehat{\tau}_{k,m}-\tau_{k,m})\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}.

For mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m and Nk,m,0,Nk,m,11subscript𝑁𝑘𝑚0subscript𝑁𝑘𝑚11N_{k,m,0},N_{k,m,1}\geq 1, let

 Uk,i=ek,i(1)Wk,i+ek,i(0)(1Wk,i)+(τk,mτk)(Wk,i Wk,m),subscript U𝑘𝑖subscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}=e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})+(\tau_{k,m}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}),

and let  Uk,i=0subscript U𝑘𝑖0\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}=0 for mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m and Nk,m,0Nk,m,1=0subscript𝑁𝑘𝑚0subscript𝑁𝑘𝑚10N_{k,m,0}N_{k,m,1}=0. Then, for mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m and Nk,m,0Nk,m,11subscript𝑁𝑘𝑚0subscript𝑁𝑘𝑚11N_{k,m,0}N_{k,m,1}\geq 1, we have

U~k,i Uk,i=(τ^kfixedτk)(Wk,i Wk,m)(α^k,mαk,m)(τ^k,mτk,m) Wk,m.subscript~𝑈𝑘𝑖subscript U𝑘𝑖superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚subscript^𝜏𝑘𝑚subscript𝜏𝑘𝑚subscript W𝑘𝑚\widetilde{U}_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}=-(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})-(\widehat{\alpha}_{k,m}-\alpha_{k,m})-(\widehat{\tau}_{k,m}-\tau_{k,m})\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}.

Then,

((\displaystyle\Bigg{(} i=1nk1{mk,i=m}Rk,iW~k,iU~k,i)2\displaystyle\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}\widetilde{U}_{k,i}\Bigg{)}^{2}
=(i=1nk1{mk,i=m}Rk,iW~k,i( Uk,i+(U~k,i Uk,i)))2absentsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript~𝑊𝑘𝑖subscript U𝑘𝑖subscript~𝑈𝑘𝑖subscript U𝑘𝑖2\displaystyle=\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}\Big{(}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}+\big{(}\widetilde{U}_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}\big{)}\Big{)}\Bigg{)}^{2}
=(i=1nk1{mk,i=m}Rk,iW~k,i( Uk,i(τ^kfixedτk)(Wk,i Wk,m)))2absentsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript~𝑊𝑘𝑖subscript U𝑘𝑖superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚2\displaystyle=\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}\Big{(}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}-(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Big{)}\Bigg{)}^{2}
=(i=1nk1{mk,i=m}Rk,iW~k,i Uk,i(τ^kfixedτk)i=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2.absentsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript~𝑊𝑘𝑖subscript U𝑘𝑖superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚2\displaystyle=\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}-(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}.

Using the formula for the second moment of a binomial distribution and n1𝑛1n\geq 1, we obtain,

E[(i=1nk1{mk,i=m}Rk,i\displaystyle E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i} Wk,i(Wk,i Wk,m))2| Nk,m=n]\displaystyle W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}\Big{|}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\Bigg{]}
=E[(i=1nk1{mk,i=m}(1 Wk,m)Rk,iWk,i)2| Nk,m=n]absent𝐸delimited-[]conditionalsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚1subscript W𝑘𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖2subscript N𝑘𝑚𝑛\displaystyle=E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})R_{k,i}W_{k,i}\Bigg{)}^{2}\Big{|}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\Bigg{]}
E[(i=1nk1{mk,i=m}Rk,iWk,i)2| Nk,m=n]absent𝐸delimited-[]conditionalsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖2subscript N𝑘𝑚𝑛\displaystyle\leq E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}\Bigg{)}^{2}\Big{|}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\Bigg{]}
n2+n.absentsuperscript𝑛2𝑛\displaystyle\leq n^{2}+n.

From the formula of the sum of the first two moments of a binomial distribution, we obtain

m=1mkE[(i=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2]m=1mk(nk,m2pk2qk+nk,mpkqk).superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚2superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘\displaystyle\sum_{m=1}^{m_{k}}E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}\Bigg{]}\leq\sum_{m=1}^{m_{k}}(n_{k,m}^{2}p_{k}^{2}q_{k}+n_{k,m}p_{k}q_{k}).

Therefore,

1nkpkqkfk(τ^kfixedτk)21subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑓𝑘superscriptsuperscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘2\displaystyle\frac{1}{n_{k}p_{k}q_{k}f_{k}}(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})^{2} m=1mkE[(i=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚2\displaystyle\sum_{m=1}^{m_{k}}E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}\Bigg{]}
nkpkqkfk(τ^kfixedτk)21(nkpkqk)2m=1mk(nk,m2pk2qk+nk,mpkqk)absentsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑓𝑘superscriptsuperscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘21superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚2superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘\displaystyle\leq\frac{n_{k}p_{k}q_{k}}{f_{k}}(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})^{2}\frac{1}{(n_{k}p_{k}q_{k})^{2}}\sum_{m=1}^{m_{k}}(n_{k,m}^{2}p_{k}^{2}q_{k}+n_{k,m}p_{k}q_{k})
=𝒪p(1)(maxmnk,mminmnk,m1mkqk+1nkpkqk)p0.absentsubscript𝒪𝑝1subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚1subscript𝑚𝑘subscript𝑞𝑘1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscript𝑝0\displaystyle=\mathcal{O}_{p}(1)\left(\frac{\max_{m}n_{k,m}}{\min_{m}n_{k,m}}\frac{1}{m_{k}q_{k}}+\frac{1}{n_{k}p_{k}q_{k}}\right)\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Now, notice that

1nkpkqkfk1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑓𝑘\displaystyle\frac{1}{n_{k}p_{k}q_{k}f_{k}} m=1mk(i=1nk1{mk,i=m}Rk,iW~k,i Uk,i)2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript~𝑊𝑘𝑖subscript U𝑘𝑖2\displaystyle\sum_{m=1}^{m_{k}}\left(\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}\right)^{2}
=1nkpkqkfkm=1mk(i=1nk1{mk,i=m}Rk,i(ek,i(1)Wk,i+ek,i(0)(1Wk,i))(Wk,i Wk,m)\displaystyle=\frac{1}{n_{k}p_{k}q_{k}f_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\big{(}e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})\big{)}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})
+(τk,mτk)i=1nk1{mk,i=m}Rk,i(Wk,i Wk,m)2)2.\displaystyle\hskip 170.71652pt+(\tau_{k,m}-\tau_{k})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{2}\Bigg{)}^{2}.

Equation (A.9) (and the analogous result for the sum involving terms with ek,i(0)subscript𝑒𝑘𝑖0e_{k,i}(0)), implies

1nkpkqkfkm=1mk(i=1nk1{mk,i=m}Rk,i(ek,i(1)Wk,i+ek,i(0)(1Wk,i))( Wk,mAk,m))2p0.superscript𝑝1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑓𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝐴𝑘𝑚20\frac{1}{n_{k}p_{k}q_{k}f_{k}}\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\big{(}e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})\big{)}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{)}^{2}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

As a result, it is enough to establish convergence of  Σk/(nkpkqkfk)subscript Σ𝑘subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑓𝑘\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}/(n_{k}p_{k}q_{k}f_{k}), where

 Σksubscript Σ𝑘\displaystyle\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k} =m=1mk(i=1nk1{mk,i=m}Rk,i(ek,i(1)Wk,i+ek,i(0)(1Wk,i))(Wk,iAk,m)\displaystyle=\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\big{(}e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})\big{)}(W_{k,i}-A_{k,m})
+(τk,mτk)i=1nk1{mk,i=m}Rk,i(Wk,i Wk,m)2)2\displaystyle+(\tau_{k,m}-\tau_{k})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{2}\Bigg{)}^{2}
=m=1mk(i=1nk1{mk,i=m}(Rk,iWk,i(Wk,iAk,m)pkqkAk,m(1Ak,m))ek,i(1)\displaystyle=\sum_{m=1}^{m_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\Big{(}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})-p_{k}q_{k}A_{k,m}(1-A_{k,m})\Big{)}e_{k,i}(1)
+i=1nk1{mk,i=m}(Rk,i(1Wk,i)(Wk,iAk,m)+pkqkAk,m(1Ak,m))ek,i(0)superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖0\displaystyle+\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\Big{(}R_{k,i}(1-W_{k,i})(W_{k,i}-A_{k,m})+p_{k}q_{k}A_{k,m}(1-A_{k,m})\Big{)}e_{k,i}(0)
+(τk,mτk)i=1nk1{mk,i=m}Rk,i(Wk,i Wk,m)2)2.\displaystyle+(\tau_{k,m}-\tau_{k})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{2}\Bigg{)}^{2}.

We will next show that

1nkpkqkfk Σkfkclusterfkp0,superscript𝑝1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑓𝑘subscript Σ𝑘superscriptsubscript𝑓𝑘clustersubscript𝑓𝑘0\frac{1}{n_{k}p_{k}q_{k}f_{k}}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}-\frac{f_{k}^{\rm cluster}}{f_{k}}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0, (A.13)

where

fkclustersuperscriptsubscript𝑓𝑘cluster\displaystyle f_{k}^{\rm cluster} =1nkE[Ak,m(1Ak,m)2]i=1nkek,i2(1)absent1subscript𝑛𝑘𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖21\displaystyle=\frac{1}{n_{k}}E[A_{k,m}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}e_{k,i}^{2}(1)
+1nkE[Ak,m2(1Ak,m)]i=1nkek,i2(0)1subscript𝑛𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖20\displaystyle+\frac{1}{n_{k}}E[A^{2}_{k,m}(1-A_{k,m})]\sum_{i=1}^{n_{k}}e_{k,i}^{2}(0)
1nkpkE[Ak,m2(1Ak,m)2]i=1nk(ek,i(1)ek,i(0))21subscript𝑛𝑘subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-\frac{1}{n_{k}}p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}
+(E[Ak,m(1Ak,m)](5+pk)E[Ak,m2(1Ak,m)2])m=1mknk,mnk(τk,mτk)2𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚5subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+(E[A_{k,m}(1-A_{k,m})]-(5+p_{k})E[A^{2}_{k,m}(1-A_{k,m})^{2}])\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+pkE[Ak,m2(1Ak,m)2]m=1mknk,m2nk(τk,mτk)2.subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝑛2𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\sum_{m=1}^{m_{k}}\frac{n^{2}_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

Let

Xk,msubscript𝑋𝑘𝑚\displaystyle X_{k,m} =1nkpkqk(i=1nk1{mk,i=m}Rk,i(ek,i(0)(1Wk,i)\displaystyle=\frac{1}{n_{k}p_{k}q_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\big{(}e_{k,i}(0)(1-W_{k,i})
+ek,i(1)Wk,i)(Wk,iAk,m)+(τk,mτk)i=1nk1{mk,i=m}Rk,i(Wk,i Wk,m)2)2\displaystyle+e_{k,i}(1)W_{k,i}\big{)}(W_{k,i}-A_{k,m})+(\tau_{k,m}-\tau_{k})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{2}\Bigg{)}^{2}

Using the result in equation (A.3.1) and results on the moments of the binomial distribution (see intermediate calculations in section A.7), we obtain

1nkpkqkE[ Σk]1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript Σ𝑘\displaystyle\frac{1}{n_{k}p_{k}q_{k}}E[\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$\Sigma$\kern-0.18004pt}}}_{k}] =m=1mkE[Xk,m]absentsuperscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscript𝑋𝑘𝑚\displaystyle=\sum_{m=1}^{m_{k}}E[X_{k,m}]
=fkcluster+\scaleto𝒪5pt(1).absentsuperscriptsubscript𝑓𝑘cluster\scaleto𝒪5𝑝𝑡1\displaystyle=f_{k}^{\rm cluster}+\scaleto{\mathcal{O}}{5pt}(1).

Therefore, to show that equation (A.13) holds, we will show

1fk2m=1mkE[Xk,m2]0.1superscriptsubscript𝑓𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚20\frac{1}{f_{k}^{2}}\sum_{m=1}^{m_{k}}E[X_{k,m}^{2}]\longrightarrow 0. (A.14)

Let

θksubscript𝜃𝑘\displaystyle\theta_{k} =E[(Rk,iWk,i(Wk,iAk,m)pkAk,m(1Ak,m))2|mk,i=m,Qk,m=1]absent𝐸delimited-[]formulae-sequenceconditionalsuperscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2subscript𝑚𝑘𝑖𝑚subscript𝑄𝑘𝑚1\displaystyle=E[(R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})-p_{k}A_{k,m}(1-A_{k,m}))^{2}|m_{k,i}=m,Q_{k,m}=1]
=pk(E[Ak,m(1Ak,m)2]pkE[Ak,m2(1Ak,m)2]),absentsubscript𝑝𝑘𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2subscript𝑝𝑘𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚2superscript1subscript𝐴𝑘𝑚2\displaystyle=p_{k}\left(E[A_{k,m}(1-A_{k,m})^{2}]-p_{k}E[A_{k,m}^{2}(1-A_{k,m})^{2}]\right),

and

πksubscript𝜋𝑘\displaystyle\pi_{k} =E[(Rk,iWk,i(Wk,iAk,m)pkAk,m(1Ak,m))4|mk,i=m,Qk,m=1]absent𝐸delimited-[]formulae-sequenceconditionalsuperscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚4subscript𝑚𝑘𝑖𝑚subscript𝑄𝑘𝑚1\displaystyle=E[(R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})-p_{k}A_{k,m}(1-A_{k,m}))^{4}|m_{k,i}=m,Q_{k,m}=1]
=pkE[(Wk,i(Wk,iAk,m)pkAk,m(1Ak,m))4|mk,i=m]+pk4(1pk)E[Ak,m4(1Ak,m)4].absentsubscript𝑝𝑘𝐸delimited-[]conditionalsuperscriptsubscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚4subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑝𝑘41subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴4𝑘𝑚superscript1subscript𝐴𝑘𝑚4\displaystyle=p_{k}E[(W_{k,i}(W_{k,i}-A_{k,m})-p_{k}A_{k,m}(1-A_{k,m}))^{4}|m_{k,i}=m]+p_{k}^{4}(1-p_{k})E[A^{4}_{k,m}(1-A_{k,m})^{4}].

Let

Xk,m,1subscript𝑋𝑘𝑚1\displaystyle X_{k,m,1} =1nkpkqk(i=1nk1{mk,i=m}Rk,iWk,i(Wk,iAk,m)ek,i(1))2absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖12\displaystyle=\frac{1}{n_{k}p_{k}q_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})e_{k,i}(1)\Bigg{)}^{2}
=Qk,mnkpkqk(i=1nk1{mk,i=m}(Rk,iWk,i(Wk,iAk,m)pkAk,m(1Ak,m))ek,i(1))2.absentsubscript𝑄𝑘𝑚subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖12\displaystyle=\frac{Q_{k,m}}{n_{k}p_{k}q_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})-p_{k}A_{k,m}(1-A_{k,m}))e_{k,i}(1)\Bigg{)}^{2}.

Then,

E[Xk,m,12]𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚12\displaystyle E[X_{k,m,1}^{2}] =qkE[Xk,m,12|Qk,m=1]absentsubscript𝑞𝑘𝐸delimited-[]conditionalsuperscriptsubscript𝑋𝑘𝑚12subscript𝑄𝑘𝑚1\displaystyle=q_{k}E[X_{k,m,1}^{2}|Q_{k,m}=1]
=πknk2pk2qki=1nk1{mk,i=m}ek,i4(1)absentsubscript𝜋𝑘subscriptsuperscript𝑛2𝑘subscriptsuperscript𝑝2𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖41\displaystyle=\frac{\pi_{k}}{n^{2}_{k}p^{2}_{k}q_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{4}(1)
+6θk2nk2pk2qki=1nk1j=i+1nk1{mk,i=mk,j=m}ek,i2(1)ek,j2(1).6subscriptsuperscript𝜃2𝑘subscriptsuperscript𝑛2𝑘subscriptsuperscript𝑝2𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1superscriptsubscript𝑗𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚superscriptsubscript𝑒𝑘𝑖21superscriptsubscript𝑒𝑘𝑗21\displaystyle+\frac{6\theta^{2}_{k}}{n^{2}_{k}p^{2}_{k}q_{k}}\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}e_{k,i}^{2}(1)e_{k,j}^{2}(1).

Therefore, because nkpkqksubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘n_{k}p_{k}q_{k}\rightarrow\infty and mkqksubscript𝑚𝑘subscript𝑞𝑘m_{k}q_{k}\rightarrow\infty, we obtain

m=1mkE[Xk,m,12]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚12\displaystyle\sum_{m=1}^{m_{k}}E[X_{k,m,1}^{2}] cnkpkqk(1nki=1nkek,i4(1))+cmkqkmaxmnk,m2minmnk,m2absent𝑐subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖41𝑐subscript𝑚𝑘subscript𝑞𝑘subscript𝑚subscriptsuperscript𝑛2𝑘𝑚subscript𝑚superscriptsubscript𝑛𝑘𝑚2\displaystyle\leq\frac{c}{n_{k}p_{k}q_{k}}\left(\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e_{k,i}^{4}(1)\right)+\frac{c}{m_{k}q_{k}}\frac{\max_{m}n^{2}_{k,m}}{\min_{m}n_{k,m}^{2}}
×(1mkm=1mk1maxmnk,m2i=1nk1j=i+1nk1{mk,i=mk,j=m}ek,i2(1)ek,j2(1))absent1subscript𝑚𝑘superscriptsubscript𝑚1subscript𝑚𝑘1subscript𝑚subscriptsuperscript𝑛2𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1superscriptsubscript𝑗𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚superscriptsubscript𝑒𝑘𝑖21superscriptsubscript𝑒𝑘𝑗21\displaystyle\hskip 28.45274pt\times\left(\frac{1}{m_{k}}\sum_{m=1}^{m_{k}}\frac{1}{\max_{m}n^{2}_{k,m}}\sum_{i=1}^{n_{k}-1}\sum_{j=i+1}^{n_{k}}1\{m_{k,i}=m_{k,j}=m\}e_{k,i}^{2}(1)e_{k,j}^{2}(1)\right)
0.absent0\displaystyle\longrightarrow 0. (A.15)

Using the same argument, we obtain

m=1mkE[Xk,m,22]0,superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚220\sum_{m=1}^{m_{k}}E[X_{k,m,2}^{2}]\longrightarrow 0, (A.16)

where

Xk,m,2=1nkpkqk(i=1nk1{mk,i=m}Rk,i(1Wk,i)(Wk,iAk,m)ek,i(0))2.subscript𝑋𝑘𝑚21subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖02X_{k,m,2}=\frac{1}{n_{k}p_{k}q_{k}}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(1-W_{k,i})(W_{k,i}-A_{k,m})e_{k,i}(0)\Bigg{)}^{2}.

Notice that equations (A.3.2) and (A.16) imply

1fk2m=1mkE[Xk,m,12]01superscriptsubscript𝑓𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚120\frac{1}{f_{k}^{2}}\sum_{m=1}^{m_{k}}E[X_{k,m,1}^{2}]\longrightarrow 0

and

1fk2m=1mkE[Xk,m,22]0.1superscriptsubscript𝑓𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚220\frac{1}{f_{k}^{2}}\sum_{m=1}^{m_{k}}E[X_{k,m,2}^{2}]\longrightarrow 0.

Notice that the last two equations hold even if fksubscript𝑓𝑘f_{k} is bounded (e.g., when τk,mτk=0subscript𝜏𝑘𝑚subscript𝜏𝑘0\tau_{k,m}-\tau_{k}=0 for all k𝑘k and m𝑚m), as long as fksubscript𝑓𝑘f_{k} is bounded away from zero in large samples. In section A.3.3 we derive conditions so that fksubscript𝑓𝑘f_{k} is bounded away from zero in large samples even if τk,mτk=0subscript𝜏𝑘𝑚subscript𝜏𝑘0\tau_{k,m}-\tau_{k}=0 for all k𝑘k and m𝑚m. Now, let

Xk,m,3=1nkpkqk((τk,mτk)i=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2.subscript𝑋𝑘𝑚31subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚2X_{k,m,3}=\frac{1}{n_{k}p_{k}q_{k}}\Bigg{(}(\tau_{k,m}-\tau_{k})\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}.

Recall that, under the conditions in (A.11) and (A.12), fksubscript𝑓𝑘f_{k}\rightarrow\infty and pkminnk,m/fksubscript𝑝𝑘subscript𝑛𝑘𝑚subscript𝑓𝑘p_{k}\min n_{k,m}/f_{k} is bounded for large k𝑘k and, therefore, pkmaxnk,m/fksubscript𝑝𝑘subscript𝑛𝑘𝑚subscript𝑓𝑘p_{k}\max n_{k,m}/f_{k} is bounded for large k𝑘k. Then (see intermediate calculations at the end of this document), for large k𝑘k,

1fk2m=1mkE[Xk,m,32]1superscriptsubscript𝑓𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚32\displaystyle\frac{1}{f_{k}^{2}}\sum_{m=1}^{m_{k}}E[X_{k,m,3}^{2}] =1(nkpkqkfk)2m=1mknk,m4pk4qk(τk,mτk)4(1+𝒪(1pkminmnk,m))absent1superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑓𝑘2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚4superscriptsubscript𝑝𝑘4subscript𝑞𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘41𝒪1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚\displaystyle=\frac{1}{(n_{k}p_{k}q_{k}f_{k})^{2}}\sum_{m=1}^{m_{k}}n_{k,m}^{4}p_{k}^{4}q_{k}(\tau_{k,m}-\tau_{k})^{4}\left(1+\mathcal{O}\left(\frac{1}{p_{k}\min_{m}n_{k,m}}\right)\right)
=pkmaxmnk,m2mkqkfkminmnk,m(pkfkm=1mknk,m2nk(τk,mτk)4)(1+𝒪(1pkminmnk,m))absentsubscript𝑝𝑘subscript𝑚subscriptsuperscript𝑛2𝑘𝑚subscript𝑚𝑘subscript𝑞𝑘subscript𝑓𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑓𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑛𝑘𝑚2subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘41𝒪1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚\displaystyle=\frac{p_{k}\max_{m}n^{2}_{k,m}}{m_{k}q_{k}f_{k}\min_{m}n_{k,m}}\left(\frac{p_{k}}{f_{k}}\sum_{m=1}^{m_{k}}\frac{n_{k,m}^{2}}{n_{k}}(\tau_{k,m}-\tau_{k})^{4}\right)\left(1+\mathcal{O}\left(\frac{1}{p_{k}\min_{m}n_{k,m}}\right)\right)
=𝒪(1mkqk)(1+𝒪(1pkminmnk,m))0.absent𝒪1subscript𝑚𝑘subscript𝑞𝑘1𝒪1subscript𝑝𝑘subscript𝑚subscript𝑛𝑘𝑚0\displaystyle=\mathcal{O}\left(\frac{1}{m_{k}q_{k}}\right)\left(1+\mathcal{O}\left(\frac{1}{p_{k}\min_{m}n_{k,m}}\right)\right)\rightarrow 0.

Now, Hölder’s inequality implies that equation (A.14) holds (see intermediate calculations).

Now let,

v~kcluster=fkcluster/(μk(1μk)σk2)2.superscriptsubscript~𝑣𝑘clustersuperscriptsubscript𝑓𝑘clustersuperscriptsubscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘2\tilde{v}_{k}^{\rm cluster}=f_{k}^{\rm cluster}/(\mu_{k}(1-\mu_{k})-\sigma^{2}_{k})^{2}.

We obtain,

V~kclusterv~k=v~kclusterv~k+\scaleto𝒪5ptp(1).superscriptsubscript~𝑉𝑘clustersubscript~𝑣𝑘superscriptsubscript~𝑣𝑘clustersubscript~𝑣𝑘\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{\widetilde{V}_{k}^{\rm{cluster}}}{\tilde{v}_{k}}=\frac{\tilde{v}_{k}^{\rm cluster}}{\tilde{v}_{k}}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

We will next establish the analogous result for the heteroskedaticity-robust variance estimator. Let

Σ~krobustsuperscriptsubscript~Σ𝑘robust\displaystyle\widetilde{\Sigma}_{k}^{\rm robust} =i=1nkRk,iW~k,i2U~k,i2.absentsuperscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscriptsubscript~𝑈𝑘𝑖2\displaystyle=\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\widetilde{U}_{k,i}^{2}.

Then, the heteroskedasticity-robust estimator of the variance of Nk(τ^kfixedτk)subscript𝑁𝑘subscriptsuperscript^𝜏fixed𝑘subscript𝜏𝑘\sqrt{N_{k}}(\widehat{\tau}^{\,{\rm fixed}}_{k}-\tau_{k}) is

V~krobust=NkQ~k1Σ~krobustQ~k1.superscriptsubscript~𝑉𝑘robustsubscript𝑁𝑘superscriptsubscript~𝑄𝑘1superscriptsubscript~Σ𝑘robustsuperscriptsubscript~𝑄𝑘1\displaystyle\widetilde{V}_{k}^{\rm{robust}}=N_{k}\widetilde{Q}_{k}^{-1}\widetilde{\Sigma}_{k}^{\rm robust}\widetilde{Q}_{k}^{-1}.

As we have established before,

U~k,isubscript~𝑈𝑘𝑖\displaystyle\widetilde{U}_{k,i} =ek,i(1)Wk,i+ek,i(0)(1Wk,i)+(τk,mτk)(Wk,i Wk,m)absentsubscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚\displaystyle=e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})+(\tau_{k,m}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})
(τ^kfixedτk)(Wk,i Wk,m)(α^k,mαk,m)(τ^k,mτk,m) Wk,m.superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚subscript^𝜏𝑘𝑚subscript𝜏𝑘𝑚subscript W𝑘𝑚\displaystyle-(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})-(\widehat{\alpha}_{k,m}-\alpha_{k,m})-(\widehat{\tau}_{k,m}-\tau_{k,m})\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}.

For mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m and Nk,m,0Nk,m,11subscript𝑁𝑘𝑚0subscript𝑁𝑘𝑚11N_{k,m,0}N_{k,m,1}\geq 1, let

 Uk,i=ek,i(1)Wk,i+ek,i(0)(1Wk,i)+(τk,mτk)(Wk,i Wk,m),subscript U𝑘𝑖subscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}=e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})+(\tau_{k,m}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}),

and let  Uk,i=0subscript U𝑘𝑖0\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}=0 for mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m and Nk,m,0Nk,m,1=0subscript𝑁𝑘𝑚0subscript𝑁𝑘𝑚10N_{k,m,0}N_{k,m,1}=0. Then, for mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m and Nk,m,0Nk,m,11subscript𝑁𝑘𝑚0subscript𝑁𝑘𝑚11N_{k,m,0}N_{k,m,1}\geq 1, we have

U~k,i Uk,i=(τ^kfixedτk)(Wk,i Wk,m)(α^k,mαk,m)(τ^k,mτk,m) Wk,m,subscript~𝑈𝑘𝑖subscript U𝑘𝑖superscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚subscript^𝜏𝑘𝑚subscript𝜏𝑘𝑚subscript W𝑘𝑚\widetilde{U}_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}=-(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})-(\widehat{\alpha}_{k,m}-\alpha_{k,m})-(\widehat{\tau}_{k,m}-\tau_{k,m})\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m},

and

1nkpkqki=1nkRk,iW~k,i2U~k,i2=1nkpkqki=1nkRk,iW~k,i2( Uk,i+(U~k,i Uk,i))2.1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscriptsubscript~𝑈𝑘𝑖21subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscriptsubscript U𝑘𝑖subscript~𝑈𝑘𝑖subscript U𝑘𝑖2\frac{1}{n_{k}p_{k}q_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\widetilde{U}_{k,i}^{2}=\frac{1}{n_{k}p_{k}q_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\Big{(}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}+\big{(}\widetilde{U}_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}\big{)}\Big{)}^{2}. (A.17)

Focusing on the part of the right hand side of last equation that depends on the first term of U~k,i Uk,isubscript~𝑈𝑘𝑖subscript U𝑘𝑖\widetilde{U}_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}, we obtain

1nkpkqki=1nkRk,iW~k,i4(τ^kfixedτk)2(τ^kfixedτk)21nkpkqki=1nkRk,iW~k,i2p0.1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖4superscriptsuperscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘2superscriptsuperscriptsubscript^𝜏𝑘fixedsubscript𝜏𝑘21subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscript𝑝0\displaystyle\frac{1}{n_{k}p_{k}q_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{4}(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})^{2}\leq(\widehat{\tau}_{k}^{\,{\rm fixed}}-\tau_{k})^{2}\frac{1}{n_{k}p_{k}q_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

We will focus now on the part of the right-hand side of equation (A.17) that that depends on the second term of U~k,i Uk,isubscript~𝑈𝑘𝑖subscript U𝑘𝑖\widetilde{U}_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i},

1nkpkqk1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘\displaystyle\frac{1}{n_{k}p_{k}q_{k}} m=1mki=1nk1{mk,i=m}Rk,iW~k,i2(α^k,mαk,m)2.superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscriptsubscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚2\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}^{2}(\widehat{\alpha}_{k,m}-\alpha_{k,m})^{2}.

Using the formula for the variance of a sample mean under sampling without replacement [e.g., in the supplement of abadie2020sampling], we obtain for 1nnk,m11𝑛subscript𝑛𝑘𝑚11\leq n\leq n_{k,m}-1,

E[(α^k,mαk,m)2i=1nk\displaystyle E\Bigg{[}(\widehat{\alpha}_{k,m}-\alpha_{k,m})^{2}\sum_{i=1}^{n_{k}} 1{mk,i=m}Rk,iW~k,i2|Nk,m,0=n]\displaystyle 1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}^{2}\Big{|}N_{k,m,0}=n\Bigg{]}
=E[(α^k,mαk,m)2 Nk,m Wk,m(1 Wk,m)|Nk,m,0=n]absent𝐸delimited-[]conditionalsuperscriptsubscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚2subscript N𝑘𝑚subscript W𝑘𝑚1subscript W𝑘𝑚subscript𝑁𝑘𝑚0𝑛\displaystyle=E\Bigg{[}(\widehat{\alpha}_{k,m}-\alpha_{k,m})^{2}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Big{|}N_{k,m,0}=n\Bigg{]}
E[n(α^k,mαk,m)2|Nk,m,0=n]absent𝐸delimited-[]conditional𝑛superscriptsubscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚2subscript𝑁𝑘𝑚0𝑛\displaystyle\leq E\Big{[}n(\widehat{\alpha}_{k,m}-\alpha_{k,m})^{2}\big{|}N_{k,m,0}=n\Big{]}
=nvar(α^k,m|Nk,m,0=n)absent𝑛varconditionalsubscript^𝛼𝑘𝑚subscript𝑁𝑘𝑚0𝑛\displaystyle=n\,\mbox{var}(\widehat{\alpha}_{k,m}|N_{k,m,0}=n)
=sk,m,02(1nnk,m),absentsubscriptsuperscript𝑠2𝑘𝑚01𝑛subscript𝑛𝑘𝑚\displaystyle=s^{2}_{k,m,0}\Big{(}1-\frac{n}{n_{k,m}}\Big{)}, (A.18)

where

sk,m,02=1nk,m1i=1nk1{mk,i=m}(yk,i(0)αk,m)2.subscriptsuperscript𝑠2𝑘𝑚01subscript𝑛𝑘𝑚1superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑦𝑘𝑖0subscript𝛼𝑘𝑚2s^{2}_{k,m,0}=\frac{1}{n_{k,m}-1}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(y_{k,i}(0)-\alpha_{k,m})^{2}.

Because sk,m,02subscriptsuperscript𝑠2𝑘𝑚0s^{2}_{k,m,0} is bounded, so is the right-hand side of equation (A.3.2). As a result

E[1nkpkqkm=1mki=1nk1{mk,i=m}Rk,iW~k,i2(α^k,mαk,m)2]cmknkpk0.𝐸delimited-[]1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscriptsubscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚2𝑐subscript𝑚𝑘subscript𝑛𝑘subscript𝑝𝑘0E\Bigg{[}\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\widetilde{W}_{k,i}^{2}(\widehat{\alpha}_{k,m}-\alpha_{k,m})^{2}\Bigg{]}\leq c\,\frac{m_{k}}{n_{k}p_{k}}\longrightarrow 0.

An analogous derivation applies to the part of the right-hand side of equation (A.17) that depends on the third term of U~k,i Uk,isubscript~𝑈𝑘𝑖subscript U𝑘𝑖\widetilde{U}_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}. (Notice that  Wk,m1subscript W𝑘𝑚1\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}\leq 1 and that τ^k,mτk,msubscript^𝜏𝑘𝑚subscript𝜏𝑘𝑚\widehat{\tau}_{k,m}-\tau_{k,m} is equal to minus the difference between α^k,mαk,msubscript^𝛼𝑘𝑚subscript𝛼𝑘𝑚\widehat{\alpha}_{k,m}-\alpha_{k,m} and the analogous difference for the treated.

Therefore, we will study the behavior of

1nkpkqki=1nkRk,iW~k,i2 Uk,i2.1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscriptsubscript U𝑘𝑖2\frac{1}{n_{k}p_{k}q_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}^{2}. (A.19)

First, notice that

1nkpkqk1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘\displaystyle\frac{1}{n_{k}p_{k}q_{k}} m=1mki=1nk1{mk,i=m}Rk,i|(Wk,i Wk,m)2(Wk,iAk,m)2|Wk,iek,i2(1)superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript𝑊𝑘𝑖subscript W𝑘𝑚2superscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚2subscript𝑊𝑘𝑖subscriptsuperscript𝑒2𝑘𝑖1\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Big{|}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{2}-(W_{k,i}-A_{k,m})^{2}\Big{|}W_{k,i}e^{2}_{k,i}(1)
c1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i|(Wk,i Wk,m)+(Wk,iAk,m)|| Wk,mAk,m|Wk,iabsent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript W𝑘𝑚subscript𝐴𝑘𝑚subscript𝑊𝑘𝑖\displaystyle\leq c\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Big{|}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})+(W_{k,i}-A_{k,m})\Big{|}\big{|}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m}\big{|}W_{k,i}
c(1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i((Wk,i Wk,m)+(Wk,iAk,m))2Wk,i)1/2absent𝑐superscript1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚2subscript𝑊𝑘𝑖12\displaystyle\leq c\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Big{(}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})+(W_{k,i}-A_{k,m})\Big{)}^{2}W_{k,i}\Bigg{)}^{1/2}
×(1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i( Wk,mAk,m)2)1/2.absentsuperscript1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚212\displaystyle\quad\times\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\big{(}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m}\big{)}^{2}\Bigg{)}^{1/2}. (A.20)

The inside of the first square root in equation (A.3.2) is bounded by a constant times

1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i,1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i},

which converges in probability to one. The expectation of the inside of the second square root in equation (A.3.2) is

1nkpkqkm=1mkE[ Nk,m( Wk,mAk,m)2]cmknkpk0.1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]subscript N𝑘𝑚superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚2𝑐subscript𝑚𝑘subscript𝑛𝑘subscript𝑝𝑘0\displaystyle\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}E\big{[}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\big{(}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m}\big{)}^{2}\big{]}\leq c\frac{m_{k}}{n_{k}p_{k}}\longrightarrow 0.

As a result, the right-hand side of equation (A.3.2) converges to zero in probability. The derivation with (1Wk,i)ek,i2(0)1subscript𝑊𝑘𝑖subscriptsuperscript𝑒2𝑘𝑖0(1-W_{k,i})e^{2}_{k,i}(0) replacing Wk,iek,i2(1)subscript𝑊𝑘𝑖subscriptsuperscript𝑒2𝑘𝑖1W_{k,i}e^{2}_{k,i}(1) in equation (A.3.2) is analogous. Now, notice that

(Wk,i\displaystyle(W_{k,i}  Wk,m)4(Wk,i Ak,m)4\displaystyle-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{4}-(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$A$\kern-0.18004pt}}}_{k,m})^{4}
=((Wk,i Wk,m)2+(Wk,iAk,m)2)((Wk,i Wk,m)+(Wk,iAk,m))( Wk,mAk,m).absentsuperscriptsubscript𝑊𝑘𝑖subscript W𝑘𝑚2superscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚2subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript W𝑘𝑚subscript𝐴𝑘𝑚\displaystyle=-\big{(}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{2}+(W_{k,i}-A_{k,m})^{2}\big{)}\big{(}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})+(W_{k,i}-A_{k,m})\big{)}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m}).

Because the first factor of the expression above is bounded, we obtain

1nkpkqk1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘\displaystyle\frac{1}{n_{k}p_{k}q_{k}} m=1mki=1nk1{mk,i=m}Rk,i|(Wk,i Wk,m)4(Wk,iAk,m)4|(τk,mτk)2superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript𝑊𝑘𝑖subscript W𝑘𝑚4superscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚4superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Big{|}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{4}-(W_{k,i}-A_{k,m})^{4}\Big{|}(\tau_{k,m}-\tau_{k})^{2}
c(1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i)1/2absent𝑐superscript1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖12\displaystyle\leq c\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Bigg{)}^{1/2}
×(1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i( Wk,mAk,m)2)1/2.absentsuperscript1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚212\displaystyle\quad\times\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\big{(}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m}\big{)}^{2}\Bigg{)}^{1/2}. (A.21)

Now, the right-hand side of equation (A.3.2) converges to zero in probability by the same argument as for equation (A.3.2). Cauchy-Schwarz inequality implies,

1nkpkqki=1nkRk,iW~k,i2 Uk,i2=1nkpkqki=1nk1{mk,i=m}Rk,i(Wk,iAk,m)2U˘k,i2+\scaleto𝒪5ptp(1),1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscript𝑅𝑘𝑖superscriptsubscript~𝑊𝑘𝑖2superscriptsubscript U𝑘𝑖21subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚2superscriptsubscript˘𝑈𝑘𝑖2\scaleto𝒪5𝑝subscript𝑡𝑝1\frac{1}{n_{k}p_{k}q_{k}}\sum_{i=1}^{n_{k}}R_{k,i}\widetilde{W}_{k,i}^{2}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$U$\kern-0.18004pt}}}_{k,i}^{2}=\frac{1}{n_{k}p_{k}q_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-A_{k,m})^{2}\breve{U}_{k,i}^{2}+\scaleto{\mathcal{O}}{5pt}_{p}(1),

where

U˘k,i=ek,i(1)Wk,i+ek,i(0)(1Wk,i)+(τk,mτk)(Wk,iAk,m),subscript˘𝑈𝑘𝑖subscript𝑒𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖01subscript𝑊𝑘𝑖subscript𝜏𝑘𝑚subscript𝜏𝑘subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚\breve{U}_{k,i}=e_{k,i}(1)W_{k,i}+e_{k,i}(0)(1-W_{k,i})+(\tau_{k,m}-\tau_{k})(W_{k,i}-A_{k,m}), (A.22)

for mk,i=msubscript𝑚𝑘𝑖𝑚m_{k,i}=m and Nk,m,0Nk,m,11subscript𝑁𝑘𝑚0subscript𝑁𝑘𝑚11N_{k,m,0}N_{k,m,1}\geq 1, and U˘k,i=0subscript˘𝑈𝑘𝑖0\breve{U}_{k,i}=0 for Nk,m,0Nk,m,10subscript𝑁𝑘𝑚0subscript𝑁𝑘𝑚10N_{k,m,0}N_{k,m,1}\geq 0. Therefore, we will study the behavior of

1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i(Wk,iAk,m)2U˘k,i2.1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚2superscriptsubscript˘𝑈𝑘𝑖2\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-A_{k,m})^{2}\breve{U}_{k,i}^{2}.

We know,

1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖\displaystyle\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i} (Wk,iAk,m)2Wk,iek,i2(1)superscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚2subscript𝑊𝑘𝑖subscriptsuperscript𝑒2𝑘𝑖1\displaystyle(W_{k,i}-A_{k,m})^{2}W_{k,i}e^{2}_{k,i}(1)
E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)p0,superscript𝑝𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑒2𝑘𝑖10\displaystyle-E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e^{2}_{k,i}(1)\stackrel{{\scriptstyle p}}{{\longrightarrow}}0,

and

1nkpkqkm=1mki=1nk1{mk,i=m}Rk,i1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖\displaystyle\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i} (Wk,iAk,m)2(1Wk,i)ek,i2(0)superscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚21subscript𝑊𝑘𝑖subscriptsuperscript𝑒2𝑘𝑖0\displaystyle(W_{k,i}-A_{k,m})^{2}(1-W_{k,i})e^{2}_{k,i}(0)
E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)p0.superscript𝑝𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑒2𝑘𝑖00\displaystyle-E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e^{2}_{k,i}(0)\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Now, notice that

E[(Wk,iAk,m)4|mk,i=m,Rk,i=1,Ak,m=a]𝐸delimited-[]formulae-sequenceconditionalsuperscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚4subscript𝑚𝑘𝑖𝑚formulae-sequencesubscript𝑅𝑘𝑖1subscript𝐴𝑘𝑚𝑎\displaystyle E[(W_{k,i}-A_{k,m})^{4}|m_{k,i}=m,R_{k,i}=1,A_{k,m}=a] =(1a)4a+a4(1a)absentsuperscript1𝑎4𝑎superscript𝑎41𝑎\displaystyle=(1-a)^{4}a+a^{4}(1-a)
=a(1a)[(1a)3+a3]absent𝑎1𝑎delimited-[]superscript1𝑎3superscript𝑎3\displaystyle=a(1-a)[(1-a)^{3}+a^{3}]
=a(1a)[13a(1a)],absent𝑎1𝑎delimited-[]13𝑎1𝑎\displaystyle=a(1-a)[1-3a(1-a)],

which implies

E[i=1nk1{mk,i=m}Rk,i\displaystyle E\Bigg{[}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i} (Wk,iAk,m)4(τk,mτk)2]\displaystyle(W_{k,i}-A_{k,m})^{4}(\tau_{k,m}-\tau_{k})^{2}\Bigg{]}
=nk,mpkqkE[Ak,m(1Ak,m)(13Ak,m(1Ak,m))](τk,mτk)2,absentsubscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚13subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle=n_{k,m}p_{k}q_{k}E[A_{k,m}(1-A_{k,m})(1-3A_{k,m}(1-A_{k,m}))](\tau_{k,m}-\tau_{k})^{2},

and

E[1nkpkqkm=1mk\displaystyle E\Bigg{[}\frac{1}{n_{k}p_{k}q_{k}}\sum_{m=1}^{m_{k}} i=1nk1{mk,i=m}Rk,i(Wk,iAk,m)4(τk,mτk)2]\displaystyle\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-A_{k,m})^{4}(\tau_{k,m}-\tau_{k})^{2}\Bigg{]}
=E[Ak,m(1Ak,m)(13Ak,m(1Ak,m))]m=1mknk,mnk(τk,mτk)2.absent𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚13subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle=E[A_{k,m}(1-A_{k,m})(1-3A_{k,m}(1-A_{k,m}))]\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

Notice now that

1(nkpkqk)2m=1mk1superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘2superscriptsubscript𝑚1subscript𝑚𝑘\displaystyle\frac{1}{(n_{k}p_{k}q_{k})^{2}}\sum_{m=1}^{m_{k}} E[(i=1nk1{mk,i=m}Rk,i(Wk,iAk,m)4(τk,mτk)2)2]𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖superscriptsubscript𝑊𝑘𝑖subscript𝐴𝑘𝑚4superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘22\displaystyle E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}(W_{k,i}-A_{k,m})^{4}(\tau_{k,m}-\tau_{k})^{2}\Bigg{)}^{2}\Bigg{]}
c1(nkpkqk)2m=1mkE[(i=1nk1{mk,i=m}Rk,i)2]absent𝑐1superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖2\displaystyle\leq c\,\frac{1}{(n_{k}p_{k}q_{k})^{2}}\sum_{m=1}^{m_{k}}E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}\Bigg{)}^{2}\Bigg{]}
cqk(nkpkqk)2m=1mk(nk,mpk+nk,m2pk2)absent𝑐subscript𝑞𝑘superscriptsubscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑝𝑘superscriptsubscript𝑛𝑘𝑚2superscriptsubscript𝑝𝑘2\displaystyle\leq c\,\frac{q_{k}}{(n_{k}p_{k}q_{k})^{2}}\sum_{m=1}^{m_{k}}(n_{k,m}p_{k}+n_{k,m}^{2}p_{k}^{2})
=c(1nkpkqk+maxmnk,mminmnk,m1mkqk)p0.absent𝑐1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚1subscript𝑚𝑘subscript𝑞𝑘superscript𝑝0\displaystyle=c\Bigg{(}\frac{1}{n_{k}p_{k}q_{k}}+\frac{\max_{m}n_{k,m}}{\min_{m}n_{k,m}}\frac{1}{m_{k}q_{k}}\Bigg{)}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Notice also that expectations of the sums of products of the terms on the right-hand side of equation (A.22) are equal to zero. Then,

1nkpkqkΣ~krobustfkrobustp0,superscript𝑝1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript~Σ𝑘robustsuperscriptsubscript𝑓𝑘robust0\displaystyle\frac{1}{n_{k}p_{k}q_{k}}\widetilde{\Sigma}_{k}^{\rm robust}-f_{k}^{\rm robust}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0,

where

fkrobustsuperscriptsubscript𝑓𝑘robust\displaystyle f_{k}^{\rm robust} =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑒2𝑘𝑖1𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑒2𝑘𝑖0\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e^{2}_{k,i}(1)+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e^{2}_{k,i}(0)
+E[Ak,m(1Ak,m)(13Ak,m(1Ak,m))]m=1mknk,mnk(τk,mτk)2.𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚13subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+E[A_{k,m}(1-A_{k,m})(1-3A_{k,m}(1-A_{k,m}))]\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

Now let,

v~krobust=fkrobust/(μk(1μk)σk2)2.superscriptsubscript~𝑣𝑘robustsuperscriptsubscript𝑓𝑘robustsuperscriptsubscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘2\tilde{v}_{k}^{\rm robust}=f_{k}^{\rm robust}/(\mu_{k}(1-\mu_{k})-\sigma^{2}_{k})^{2}.

We obtain,

V~krobust=v~krobust+\scaleto𝒪5ptp(1).superscriptsubscript~𝑉𝑘robustsuperscriptsubscript~𝑣𝑘robust\scaleto𝒪5𝑝subscript𝑡𝑝1\widetilde{V}_{k}^{\rm{robust}}=\tilde{v}_{k}^{\rm robust}+\scaleto{\mathcal{O}}{5pt}_{p}(1).

A.3.3 Large k𝑘k results the fixed effects case under homogeneous average treatment effects across clusters

We will now study the Lyapounov’s condition for the case τk,m=τksubscript𝜏𝑘𝑚subscript𝜏𝑘\tau_{k,m}=\tau_{k} for all k𝑘k and m=1,,mk𝑚1subscript𝑚𝑘m=1,\ldots,m_{k}, so

fk=m=1mkE[ψk,m2].subscript𝑓𝑘superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜓𝑘𝑚2f_{k}=\sum_{m=1}^{m_{k}}E[\psi_{k,m}^{2}].

Notice that

m=1mkE[ψk,m2]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝜓𝑘𝑚2\displaystyle\sum_{m=1}^{m_{k}}E[\psi_{k,m}^{2}] 1nkE[Ak,m(1Ak,m)2]i=1nk1{mk,i=m}ek,i2(1)absent1subscript𝑛𝑘𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖21\displaystyle\geq\frac{1}{n_{k}}E[A_{k,m}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(1)
+1nkE[Ak,m2(1Ak,m)]i=1nk1{mk,i=m}ek,i2(0)1subscript𝑛𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖20\displaystyle+\frac{1}{n_{k}}E[A^{2}_{k,m}(1-A_{k,m})]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(0)
1nkE[Ak,m2(1Ak,m)2]i=1nk1{mk,i=m}(ek,i(1)ek,i(0))21subscript𝑛𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle-\frac{1}{n_{k}}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(e_{k,i}(1)-e_{k,i}(0))^{2}
=1nkE[Ak,m(1Ak,m)3]i=1nk1{mk,i=m}ek,i2(1)absent1subscript𝑛𝑘𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚3superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖21\displaystyle=\frac{1}{n_{k}}E[A_{k,m}(1-A_{k,m})^{3}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(1)
+1nkE[Ak,m3(1Ak,m)]i=1nk1{mk,i=m}ek,i2(0)1subscript𝑛𝑘𝐸delimited-[]subscriptsuperscript𝐴3𝑘𝑚1subscript𝐴𝑘𝑚superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖20\displaystyle+\frac{1}{n_{k}}E[A^{3}_{k,m}(1-A_{k,m})]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}^{2}(0)
+2nkE[Ak,m2(1Ak,m)2]i=1nk1{mk,i=m}ek,i(1)ek,i(0)2subscript𝑛𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖0\displaystyle+\frac{2}{n_{k}}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}e_{k,i}(1)e_{k,i}(0)
=E[1nkm=1mkAk,m3(1Ak,m)3i=1nk1{mk,i=m}(ek,i(1)Ak,m+ek,i(0)1Ak,m)2].absent𝐸delimited-[]1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝐴3𝑘𝑚superscript1subscript𝐴𝑘𝑚3superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖1subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖01subscript𝐴𝑘𝑚2\displaystyle=E\left[\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}A^{3}_{k,m}(1-A_{k,m})^{3}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\left(\frac{e_{k,i}(1)}{A_{k,m}}+\frac{e_{k,i}(0)}{1-A_{k,m}}\right)^{2}\right].

Therefore,

lim infkE[1nkm=1mkAk,m3(1Ak,m)3i=1nk1{mk,i=m}(ek,i(1)Ak,m+ek,i(0)1Ak,m)2]>0subscriptlimit-infimum𝑘𝐸delimited-[]1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝐴3𝑘𝑚superscript1subscript𝐴𝑘𝑚3superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖1subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖01subscript𝐴𝑘𝑚20\liminf_{k\rightarrow\infty}E\left[\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}A^{3}_{k,m}(1-A_{k,m})^{3}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\left(\frac{e_{k,i}(1)}{A_{k,m}}+\frac{e_{k,i}(0)}{1-A_{k,m}}\right)^{2}\right]>0

is sufficient for lim infkfk>0subscriptlimit-infimum𝑘subscript𝑓𝑘0\liminf_{k\rightarrow\infty}f_{k}>0 (even if condition (A.11) does not hold). Given our assumption that the supports of the cluster probabilities, Ak,msubscript𝐴𝑘𝑚A_{k,m}, are bounded away from zero and one (uniformly in k𝑘k and m𝑚m), then

lim infkE[1nkm=1mki=1nk1{mk,i=m}(ek,i(1)Ak,m+ek,i(0)1Ak,m)2]>0subscriptlimit-infimum𝑘𝐸delimited-[]1subscript𝑛𝑘superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚superscriptsubscript𝑒𝑘𝑖1subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖01subscript𝐴𝑘𝑚20\liminf_{k\rightarrow\infty}E\left[\frac{1}{n_{k}}\sum_{m=1}^{m_{k}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}\left(\frac{e_{k,i}(1)}{A_{k,m}}+\frac{e_{k,i}(0)}{1-A_{k,m}}\right)^{2}\right]>0 (A.23)

is sufficient for lim infkfk>0subscriptlimit-infimum𝑘subscript𝑓𝑘0\liminf_{k\rightarrow\infty}f_{k}>0. Assume that (A.23) holds, so lim infkfk>0subscriptlimit-infimum𝑘subscript𝑓𝑘0\liminf_{k\rightarrow\infty}f_{k}>0. We now obtain,

E[|i=1nk1{mk,i=m}Rk,i\displaystyle E\Bigg{[}\Bigg{|}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i} Wk,i(Wk,i Wk,m)ek,i(1)|4|Qk,m=1,Ak,m]\displaystyle W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})e_{k,i}(1)\Bigg{|}^{4}\,\Big{|}\,Q_{k,m}=1,A_{k,m}\Bigg{]}
=E[(1 Wk,m)4|i=1nk1{mk,i=m}Rk,iWk,iek,i(1)|4|Qk,m=1,Ak,m]absent𝐸delimited-[]conditionalsuperscript1subscript W𝑘𝑚4superscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖14subscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚\displaystyle=E\Bigg{[}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})^{4}\Bigg{|}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}e_{k,i}(1)\Bigg{|}^{4}\,\Big{|}\,Q_{k,m}=1,A_{k,m}\Bigg{]}
E[|i=1nk1{mk,i=m}Rk,iWk,iek,i(1)|4|Qk,m=1,Ak,m],absent𝐸delimited-[]conditionalsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑒𝑘𝑖14subscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚\displaystyle\leq E\Bigg{[}\Bigg{|}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}e_{k,i}(1)\Bigg{|}^{4}\,\Big{|}\,Q_{k,m}=1,A_{k,m}\Bigg{]},

and

E[|i=1nk1{mk,i=m}Rk,i\displaystyle E\Bigg{[}\Bigg{|}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i} Wk,iek,i(1)|4|Qk,m=1,Ak,m]\displaystyle W_{k,i}e_{k,i}(1)\Bigg{|}^{4}\,\Big{|}\,Q_{k,m}=1,A_{k,m}\Bigg{]}
=E[|i=1nk1{mk,i=m}(Rk,iWk,ipkAk,m)ek,i(1)|4|Qk,m=1,Ak,m]absent𝐸delimited-[]conditionalsuperscriptsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝐴𝑘𝑚subscript𝑒𝑘𝑖14subscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚\displaystyle=E\Bigg{[}\Bigg{|}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}(R_{k,i}W_{k,i}-p_{k}A_{k,m})e_{k,i}(1)\Bigg{|}^{4}\,\Big{|}\,Q_{k,m}=1,A_{k,m}\Bigg{]}
=nk,mE[(Rk,iWk,ipkAk,m)4|Qk,m=1,Ak,m]absentsubscript𝑛𝑘𝑚𝐸delimited-[]conditionalsuperscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝐴𝑘𝑚4subscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚\displaystyle=n_{k,m}E[(R_{k,i}W_{k,i}-p_{k}A_{k,m})^{4}|Q_{k,m}=1,A_{k,m}]
+3nk,m(nk,m1)(E[(Rk,iWk,ipkAk,m)2|Qk,m=1,Ak,m])2.3subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1superscript𝐸delimited-[]conditionalsuperscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝐴𝑘𝑚2subscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚2\displaystyle+3n_{k,m}(n_{k,m}-1)(E[(R_{k,i}W_{k,i}-p_{k}A_{k,m})^{2}|Q_{k,m}=1,A_{k,m}])^{2}.

The first equality holds because the terms ek,i(1)subscript𝑒𝑘𝑖1e_{k,i}(1) sum to zero within clusters. The second equality holds because, if mk,i=mk,j=msubscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝑚m_{k,i}=m_{k,j}=m, with ij𝑖𝑗i\neq j, then Rk,iWk,isubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖R_{k,i}W_{k,i} and Rk,iWk,isubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖R_{k,i}W_{k,i} are independent conditional on Qk,m=1,Ak,msubscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚Q_{k,m}=1,A_{k,m}, and E[Rk,iWk,ipkAk,m|Qk,m=1,Ak,m]=0𝐸delimited-[]subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖conditionalsubscript𝑝𝑘subscript𝐴𝑘𝑚subscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚0E[R_{k,i}W_{k,i}-p_{k}A_{k,m}|Q_{k,m}=1,A_{k,m}]=0. Notice that

E[(Rk,iWk,ipkAk,m)2|Qk,m=1,Ak,m]=pkAk,m(1pkAk,m)pk,𝐸delimited-[]conditionalsuperscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝐴𝑘𝑚2subscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝑝𝑘subscript𝐴𝑘𝑚subscript𝑝𝑘E[(R_{k,i}W_{k,i}-p_{k}A_{k,m})^{2}|Q_{k,m}=1,A_{k,m}]=p_{k}A_{k,m}(1-p_{k}A_{k,m})\leq p_{k},

which also implies E[(Rk,iWk,ipkAk,m)4|Qk,m=1,Ak,m]pk𝐸delimited-[]conditionalsuperscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝐴𝑘𝑚4subscript𝑄𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑝𝑘E[(R_{k,i}W_{k,i}-p_{k}A_{k,m})^{4}|Q_{k,m}=1,A_{k,m}]\leq p_{k}. As a result,

m=1mkE[|1nkpkqki=1nk1{mk,i=m}\displaystyle\sum_{m=1}^{m_{k}}E\Bigg{[}\Bigg{|}\frac{1}{\sqrt{n_{k}p_{k}q_{k}}}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\} Rk,iWk,iek,i(1)|4]\displaystyle R_{k,i}W_{k,i}e_{k,i}(1)\Bigg{|}^{4}\Bigg{]}
1nkpkqk+3maxmnk,mminmnk,m1mkqk0.absent1subscript𝑛𝑘subscript𝑝𝑘subscript𝑞𝑘3subscript𝑚subscript𝑛𝑘𝑚subscript𝑚subscript𝑛𝑘𝑚1subscript𝑚𝑘subscript𝑞𝑘0\displaystyle\leq\frac{1}{n_{k}p_{k}q_{k}}+3\,\frac{\max_{m}n_{k,m}}{\min_{m}n_{k,m}}\frac{1}{m_{k}q_{k}}\rightarrow 0.

A.4 Derivations of the variance estimators

In this section, we derive the adjustments in the CCV variance. (We do this under the assumption that the Zisubscript𝑍𝑖Z_{i} are independent. In our simulations we actually use a slightly different sampling scheme for the Zisubscript𝑍𝑖Z_{i} where the average Z¯k,msubscript¯𝑍𝑘𝑚\overline{Z}_{k,m} is identical and fixed in each cluster.) To derive the CCV variance of the least squares estimator, consider first a variance estimator of the form

(i=1nVi)2.superscriptsuperscriptsubscript𝑖1𝑛subscript𝑉𝑖2\left(\sum_{i=1}^{n}V_{i}\right)^{2}.

We aim, however, to design an estimator based on a subsample consisting of units with Zi=1subscript𝑍𝑖1Z_{i}=1, where Zi{0,1}subscript𝑍𝑖01Z_{i}\in\{0,1\} is i.i.d. binary with Pr(Zi=1)=pZPrsubscript𝑍𝑖1subscript𝑝𝑍\Pr(Z_{i}=1)=p_{Z} and independent of Visubscript𝑉𝑖V_{i}. First, notice that

E[(i=1nVi)2]=i=1nE[Vi2]+2i=1n1j=i+1nE[ViVj],𝐸delimited-[]superscriptsuperscriptsubscript𝑖1𝑛subscript𝑉𝑖2superscriptsubscript𝑖1𝑛𝐸delimited-[]superscriptsubscript𝑉𝑖22superscriptsubscript𝑖1𝑛1superscriptsubscript𝑗𝑖1𝑛𝐸delimited-[]subscript𝑉𝑖subscript𝑉𝑗E\left[\left(\sum_{i=1}^{n}V_{i}\right)^{2}\right]=\sum_{i=1}^{n}E[V_{i}^{2}]+2\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}E[V_{i}V_{j}],

and

E[(i=1nZiVi)2]=pZi=1nE[Vi2]+2pZ2i=1n1j=i+1nE[ViVj].𝐸delimited-[]superscriptsuperscriptsubscript𝑖1𝑛subscript𝑍𝑖subscript𝑉𝑖2subscript𝑝𝑍superscriptsubscript𝑖1𝑛𝐸delimited-[]superscriptsubscript𝑉𝑖22superscriptsubscript𝑝𝑍2superscriptsubscript𝑖1𝑛1superscriptsubscript𝑗𝑖1𝑛𝐸delimited-[]subscript𝑉𝑖subscript𝑉𝑗E\left[\left(\sum_{i=1}^{n}Z_{i}V_{i}\right)^{2}\right]=p_{Z}\sum_{i=1}^{n}E[V_{i}^{2}]+2p_{Z}^{2}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}E[V_{i}V_{j}].

Therefore,

E[1pZ(i=1nZiVi)2]=i=1nE[Vi2]+2pZi=1n1j=i+1nE[ViVj],𝐸delimited-[]1subscript𝑝𝑍superscriptsuperscriptsubscript𝑖1𝑛subscript𝑍𝑖subscript𝑉𝑖2superscriptsubscript𝑖1𝑛𝐸delimited-[]superscriptsubscript𝑉𝑖22subscript𝑝𝑍superscriptsubscript𝑖1𝑛1superscriptsubscript𝑗𝑖1𝑛𝐸delimited-[]subscript𝑉𝑖subscript𝑉𝑗E\left[\frac{1}{p_{Z}}\left(\sum_{i=1}^{n}Z_{i}V_{i}\right)^{2}\right]=\sum_{i=1}^{n}E[V_{i}^{2}]+2p_{Z}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}E[V_{i}V_{j}],

and

(1pZ)pZ2(E[(i=1nZiVi)2]pZi=1nE[Vi2])=2(1pZ)i=1n1j=i+1nE[ViVj].1subscript𝑝𝑍superscriptsubscript𝑝𝑍2𝐸delimited-[]superscriptsuperscriptsubscript𝑖1𝑛subscript𝑍𝑖subscript𝑉𝑖2subscript𝑝𝑍superscriptsubscript𝑖1𝑛𝐸delimited-[]superscriptsubscript𝑉𝑖221subscript𝑝𝑍superscriptsubscript𝑖1𝑛1superscriptsubscript𝑗𝑖1𝑛𝐸delimited-[]subscript𝑉𝑖subscript𝑉𝑗\frac{(1-p_{Z})}{p_{Z}^{2}}\left(E\left[\left(\sum_{i=1}^{n}Z_{i}V_{i}\right)^{2}\right]-p_{Z}\sum_{i=1}^{n}E[V_{i}^{2}]\right)=2(1-p_{Z})\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}E[V_{i}V_{j}].

Adding the last two equations,

E[(i=1nVi)2]𝐸delimited-[]superscriptsuperscriptsubscript𝑖1𝑛subscript𝑉𝑖2\displaystyle E\left[\left(\sum_{i=1}^{n}V_{i}\right)^{2}\right] =1pZ2E[(i=1nZiVi)2](1pZ)pZi=1nE[Vi2]absent1superscriptsubscript𝑝𝑍2𝐸delimited-[]superscriptsuperscriptsubscript𝑖1𝑛subscript𝑍𝑖subscript𝑉𝑖21subscript𝑝𝑍subscript𝑝𝑍superscriptsubscript𝑖1𝑛𝐸delimited-[]superscriptsubscript𝑉𝑖2\displaystyle=\frac{1}{p_{Z}^{2}}E\left[\left(\sum_{i=1}^{n}Z_{i}V_{i}\right)^{2}\right]-\frac{(1-p_{Z})}{p_{Z}}\sum_{i=1}^{n}E[V_{i}^{2}]
=1pZ2E[(i=1nZiVi)2](1pZ)pZ2i=1nE[ZiVi2].absent1superscriptsubscript𝑝𝑍2𝐸delimited-[]superscriptsuperscriptsubscript𝑖1𝑛subscript𝑍𝑖subscript𝑉𝑖21subscript𝑝𝑍superscriptsubscript𝑝𝑍2superscriptsubscript𝑖1𝑛𝐸delimited-[]subscript𝑍𝑖superscriptsubscript𝑉𝑖2\displaystyle=\frac{1}{p_{Z}^{2}}E\left[\left(\sum_{i=1}^{n}Z_{i}V_{i}\right)^{2}\right]-\frac{(1-p_{Z})}{p_{Z}^{2}}\sum_{i=1}^{n}E[Z_{i}V_{i}^{2}]. (A.24)

The first term of the CCV variance estimator for least squares is based on the sample counterpart of the right-hand side of equation (A.24), with 1{mk,i=m}Rk,i((Wk,i Wk)U^k,i(τ^k,mτ^k) Wk(1 Wk))1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘subscript^𝑈𝑘𝑖subscript^𝜏𝑘𝑚subscript^𝜏𝑘subscript W𝑘1subscript W𝑘1\{m_{k,i}=m\}R_{k,i}((W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k})\widehat{U}_{k,i}-(\widehat{\tau}_{k,m}-\widehat{\tau}_{k})\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k}(1-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k})) in the role of Visubscript𝑉𝑖V_{i}.

To derive the CCV variance estimator for the fixed effect case, consider

λk=1qk(E[Ak,m(1Ak,m)])2E[Ak,m2(1Ak,m)2],subscript𝜆𝑘1subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚2superscript1subscript𝐴𝑘𝑚2\lambda_{k}=1-q_{k}\frac{(E[A_{k,m}(1-A_{k,m})])^{2}}{E[A_{k,m}^{2}(1-A_{k,m})^{2}]},

and let fkCCV=λkfkcluster+(1λk)fkrobustsuperscriptsubscript𝑓𝑘CCVsubscript𝜆𝑘superscriptsubscript𝑓𝑘cluster1subscript𝜆𝑘superscriptsubscript𝑓𝑘robustf_{k}^{\rm CCV}=\lambda_{k}f_{k}^{\rm cluster}+(1-\lambda_{k})f_{k}^{\rm robust}. This transformation is designed to reproduce the terms in fksubscript𝑓𝑘f_{k} with factor

m=1mknk,m2nk(τk,mτk)2.superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝑛2𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\sum_{m=1}^{m_{k}}\frac{n^{2}_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

These terms dominate fksubscript𝑓𝑘f_{k} as k𝑘k increases. It also reproduces several lower order terms.

Notice that

fkrobustsuperscriptsubscript𝑓𝑘robust\displaystyle f_{k}^{\rm robust} =E[Ak,m(1Ak,m)2]1nki=1nkek,i2(1)+E[Ak,m2(1Ak,m)]1nki=1nkek,i2(0)absent𝐸delimited-[]subscript𝐴𝑘𝑚superscript1subscript𝐴𝑘𝑚21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑒2𝑘𝑖1𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚1subscript𝐴𝑘𝑚1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘subscriptsuperscript𝑒2𝑘𝑖0\displaystyle=E[A_{k,m}(1-A_{k,m})^{2}]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e^{2}_{k,i}(1)+E[A^{2}_{k,m}(1-A_{k,m})]\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}e^{2}_{k,i}(0)
+(E[Ak,m(1Ak,m)](5+pk)E[Ak,m2(1Ak,m)2])m=1mknk,mnk(τk,mτk)2𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚5subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+\Big{(}E[A_{k,m}(1-A_{k,m})]-(5+p_{k})E[A^{2}_{k,m}(1-A_{k,m})^{2}]\Big{)}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+(2+pk)E[Ak,m2(1Ak,m)2]m=1mknk,mnk(τk,mτk)2.2subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+(2+p_{k})E[A^{2}_{k,m}(1-A_{k,m})^{2}]\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

Then,

fkCCVfksuperscriptsubscript𝑓𝑘CCVsubscript𝑓𝑘\displaystyle f_{k}^{\rm CCV}-f_{k} =(1λk)pkE[Ak,m2(1Ak,m)2](m=1mknk,mnk(τk,mτk)2+1nki=1nk(ek,i(1)ek,i(0))2)absent1subscript𝜆𝑘subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle=(1-\lambda_{k})p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\left(\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}+\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}\right)
=pkqk(E[Ak,m(1Ak,m)])2(m=1mknk,mnk(τk,mτk)2+1nki=1nk(ek,i(1)ek,i(0))2).absentsubscript𝑝𝑘subscript𝑞𝑘superscript𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘21subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\displaystyle=p_{k}q_{k}(E[A_{k,m}(1-A_{k,m})])^{2}\left(\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}+\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}\right).

For v~kCCV=fkCCV/(μk(1μk)σk2)2superscriptsubscript~𝑣𝑘CCVsuperscriptsubscript𝑓𝑘CCVsuperscriptsubscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘2\tilde{v}_{k}^{\rm CCV}=f_{k}^{\rm CCV}/(\mu_{k}(1-\mu_{k})-\sigma^{2}_{k})^{2}, we obtain,

v~kCCVv~k=pkqkm=1mknk,mnk(τk,mτk)2+pkqk1nki=1nk(ek,i(1)ek,i(0))2.superscriptsubscript~𝑣𝑘CCVsubscript~𝑣𝑘subscript𝑝𝑘subscript𝑞𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2subscript𝑝𝑘subscript𝑞𝑘1subscript𝑛𝑘superscriptsubscript𝑖1subscript𝑛𝑘superscriptsubscript𝑒𝑘𝑖1subscript𝑒𝑘𝑖02\tilde{v}_{k}^{\rm CCV}-\tilde{v}_{k}=p_{k}q_{k}\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}+p_{k}q_{k}\frac{1}{n_{k}}\sum_{i=1}^{n_{k}}(e_{k,i}(1)-e_{k,i}(0))^{2}. (A.25)

The difference v~kCCVv~ksuperscriptsubscript~𝑣𝑘CCVsubscript~𝑣𝑘\tilde{v}_{k}^{\rm CCV}-\tilde{v}_{k} is non-negative and of smaller order than v~ksubscript~𝑣𝑘\tilde{v}_{k}. Therefore, v~kCCV/v~k1superscriptsubscript~𝑣𝑘CCVsubscript~𝑣𝑘1\tilde{v}_{k}^{\rm CCV}/\tilde{v}_{k}\rightarrow 1 (even if v~kCCVv~ksuperscriptsubscript~𝑣𝑘CCVsubscript~𝑣𝑘\tilde{v}_{k}^{\rm CCV}-\tilde{v}_{k} is bounded away from zero). The first term on the right-hand side of (A.25) could be estimated to further correct the difference between the CCV estimator and the variance of τ^kfixedsuperscriptsubscript^𝜏𝑘fixed\widehat{\tau}_{k}^{\rm fixed}.

A.5 Limit results

Let Xk,msubscript𝑋𝑘𝑚X_{k,m} be an infinite array of random variables, with rows indexed by k=1,2,𝑘12k=1,2,\ldots, and the columns of the k𝑘k-th row indexed by m=1,,mk𝑚1subscript𝑚𝑘m=1,\ldots,m_{k}. Let

Sk=m=1mkXk,m,subscript𝑆𝑘superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑋𝑘𝑚S_{k}=\sum_{m=1}^{m_{k}}X_{k,m},

and ak=E[Sk]subscript𝑎𝑘𝐸delimited-[]subscript𝑆𝑘a_{k}=E[S_{k}].

A Weak Law of Large Numbers for Arrays: For each k=1,2,𝑘12k=1,2,\ldots , suppose that Xk,1,,subscript𝑋𝑘1X_{k,1},\ldots, Xk,mksubscript𝑋𝑘subscript𝑚𝑘X_{k,m_{k}} are independent and have finite second moments. In addition, let bksubscript𝑏𝑘b_{k} be a sequence of positive constants such that

1bk2m=1mkE[Xk,m2]0.1superscriptsubscript𝑏𝑘2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚20\frac{1}{b_{k}^{2}}\sum_{m=1}^{m_{k}}E[X_{k,m}^{2}]\longrightarrow 0.

Then,

Skakbkp0.superscript𝑝subscript𝑆𝑘subscript𝑎𝑘subscript𝑏𝑘0\frac{S_{k}-a_{k}}{b_{k}}\stackrel{{\scriptstyle p}}{{\longrightarrow}}0.

Proof: By Chebyshev’s inequality, for any ε>0𝜀0\varepsilon>0

Pr(|Skakbk|>ε)Prsubscript𝑆𝑘subscript𝑎𝑘subscript𝑏𝑘𝜀\displaystyle\Pr\left(\left|\frac{S_{k}-a_{k}}{b_{k}}\right|>\varepsilon\right) 1bk2ε2var(Sk)absent1superscriptsubscript𝑏𝑘2superscript𝜀2varsubscript𝑆𝑘\displaystyle\leq\frac{1}{b_{k}^{2}\varepsilon^{2}}\mbox{var}(S_{k})
=1bk2ε2m=1mkvar(Xk,m)absent1superscriptsubscript𝑏𝑘2superscript𝜀2superscriptsubscript𝑚1subscript𝑚𝑘varsubscript𝑋𝑘𝑚\displaystyle=\frac{1}{b_{k}^{2}\varepsilon^{2}}\sum_{m=1}^{m_{k}}\mbox{var}(X_{k,m})
1bk2ε2m=1mkE[Xk,m2]0.absent1superscriptsubscript𝑏𝑘2superscript𝜀2superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚20\displaystyle\leq\frac{1}{b_{k}^{2}\varepsilon^{2}}\sum_{m=1}^{m_{k}}E[X_{k,m}^{2}]\longrightarrow 0.

\square

A Central Limit Theorem for Arrays: For each k=1,2,𝑘12k=1,2,\ldots , suppose that Xk,1,,subscript𝑋𝑘1X_{k,1},\ldots, Xk,mksubscript𝑋𝑘subscript𝑚𝑘X_{k,m_{k}} are independent, with zero means, E[Xk,m]=0𝐸delimited-[]subscript𝑋𝑘𝑚0E[X_{k,m}]=0, and finite variances, σk,m2=E[Xk,m2]subscriptsuperscript𝜎2𝑘𝑚𝐸delimited-[]subscriptsuperscript𝑋2𝑘𝑚\sigma^{2}_{k,m}=E[X^{2}_{k,m}], for m=1,,mk𝑚1subscript𝑚𝑘m=1,\ldots,m_{k}. Let

sk2=m=1mkσk,m2.superscriptsubscript𝑠𝑘2superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝜎2𝑘𝑚s_{k}^{2}=\sum_{m=1}^{m_{k}}\sigma^{2}_{k,m}.

Assume also that Lyapounov’s condition holds,

limk1sk2+δm=1mkE[|Xk,m|2+δ]=0,subscript𝑘1superscriptsubscript𝑠𝑘2𝛿superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚2𝛿0\lim_{k\rightarrow\infty}\frac{1}{s_{k}^{2+\delta}}\sum_{m=1}^{m_{k}}E[|X_{k,m}|^{2+\delta}]=0,

for some δ>0𝛿0\delta>0. Then,

SkskdN(0,1).superscript𝑑subscript𝑆𝑘subscript𝑠𝑘𝑁01\frac{S_{k}}{s_{k}}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).

Proof: billingsley, Chapter 27.

A.6 Intermediate calculations for Section A.2

The calculation of vksubscript𝑣𝑘v_{k} uses the following results.

E[(Rk,iWk,ipkqkμk)2]=pkqkμk(1pkqkμk),𝐸delimited-[]superscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘2subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘E[(R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k})^{2}]=p_{k}q_{k}\mu_{k}(1-p_{k}q_{k}\mu_{k}),
E[(Rk,i(1Wk,i)pkqk(1μk))2]=pkqk(1μk)(1pkqk(1μk)),𝐸delimited-[]superscriptsubscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘2subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘1subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘E[(R_{k,i}(1-W_{k,i})-p_{k}q_{k}(1-\mu_{k}))^{2}]=p_{k}q_{k}(1-\mu_{k})(1-p_{k}q_{k}(1-\mu_{k})),
E[(Rk,iWk,ipkqkμk)(Rk,i(1Wk,i)pkqk(1μk))]=pk2qk2μk(1μk),𝐸delimited-[]subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘superscriptsubscript𝑝𝑘2superscriptsubscript𝑞𝑘2subscript𝜇𝑘1subscript𝜇𝑘E[(R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k})(R_{k,i}(1-W_{k,i})-p_{k}q_{k}(1-\mu_{k}))]=-p_{k}^{2}q_{k}^{2}\mu_{k}(1-\mu_{k}),
E[Rk,iWk,iRk,jWk,j|mk,i=mk,j]=E[pk2qkAk,m2]=pk2qk(σk2+μk2),𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑅𝑘𝑗subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝐸delimited-[]superscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝐴𝑘𝑚2superscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘2E[R_{k,i}W_{k,i}R_{k,j}W_{k,j}|m_{k,i}=m_{k,j}]=E[p_{k}^{2}q_{k}A_{k,m}^{2}]=p_{k}^{2}q_{k}(\sigma_{k}^{2}+\mu_{k}^{2}),

and

E[(Rk,iWk,ipkqkμk)(Rk,jWk,jpkqkμk)|mk,i=mk,j]𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑅𝑘𝑗subscript𝑊𝑘𝑗subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle E[(R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k})(R_{k,j}W_{k,j}-p_{k}q_{k}\mu_{k})|m_{k,i}=m_{k,j}] =pk2qk(σk2+μk2)(pkqkμk)2absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘2superscriptsubscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘2\displaystyle=p_{k}^{2}q_{k}(\sigma_{k}^{2}+\mu_{k}^{2})-(p_{k}q_{k}\mu_{k})^{2}
=pk2qk(σk2+(1qk)μk2).absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘21subscript𝑞𝑘superscriptsubscript𝜇𝑘2\displaystyle=p_{k}^{2}q_{k}(\sigma_{k}^{2}+(1-q_{k})\mu_{k}^{2}).

Similarly,

E[(Rk,i(1Wk,i)\displaystyle E[(R_{k,i}(1-W_{k,i}) pkqk(1μk))(Rk,j(1Wk,j)pkqk(1μk))|mk,i=mk,j]\displaystyle-p_{k}q_{k}(1-\mu_{k}))(R_{k,j}(1-W_{k,j})-p_{k}q_{k}(1-\mu_{k}))|m_{k,i}=m_{k,j}]
=pk2qk(σk2+(1qk)(1μk)2).absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘21subscript𝑞𝑘superscript1subscript𝜇𝑘2\displaystyle=p_{k}^{2}q_{k}(\sigma_{k}^{2}+(1-q_{k})(1-\mu_{k})^{2}).

Notice also that

E[Rk,iWk,iRk,j(1Wk,j)|mk,i=mk,j]𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑅𝑘𝑗1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle E[R_{k,i}W_{k,i}R_{k,j}(1-W_{k,j})|m_{k,i}=m_{k,j}] =E[pk2qkAk,m(1Ak,m)]absent𝐸delimited-[]superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚\displaystyle=E[p_{k}^{2}q_{k}A_{k,m}(1-A_{k,m})]
=pk2qk(μk(1μk)σk2),absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘2\displaystyle=p_{k}^{2}q_{k}(\mu_{k}(1-\mu_{k})-\sigma_{k}^{2}),

and

E[(Rk,iWk,ipkqkμk)\displaystyle E[(R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}) (Rk,j(1Wk,j)pkqk(1μk))|mk,i=mk,j]\displaystyle(R_{k,j}(1-W_{k,j})-p_{k}q_{k}(1-\mu_{k}))|m_{k,i}=m_{k,j}]
=pk2qk(μk(1μk)σk2)pk2qk2μk(1μk)absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝑝𝑘2superscriptsubscript𝑞𝑘2subscript𝜇𝑘1subscript𝜇𝑘\displaystyle=p_{k}^{2}q_{k}(\mu_{k}(1-\mu_{k})-\sigma_{k}^{2})-p_{k}^{2}q_{k}^{2}\mu_{k}(1-\mu_{k})
=pk2qk(μk(1μk)(1qk)σk2).absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝜇𝑘1subscript𝜇𝑘1subscript𝑞𝑘superscriptsubscript𝜎𝑘2\displaystyle=p_{k}^{2}q_{k}(\mu_{k}(1-\mu_{k})(1-q_{k})-\sigma_{k}^{2}).

  The following bounds are useful to prove Lyapunov’s condition.

E[|Rk,iWk,ipkqkμk|3]𝐸delimited-[]superscriptsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘3\displaystyle E[|R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}|^{3}] =(1pkqkμk)3pkqkμk+(pkqkμk)3(1pkqkμk)absentsuperscript1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘3subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘superscriptsubscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘31subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘\displaystyle=(1-p_{k}q_{k}\mu_{k})^{3}p_{k}q_{k}\mu_{k}+(p_{k}q_{k}\mu_{k})^{3}(1-p_{k}q_{k}\mu_{k})
cpkqk.absent𝑐subscript𝑝𝑘subscript𝑞𝑘\displaystyle\leq c\,p_{k}q_{k}.

Let Qk,msubscript𝑄𝑘𝑚Q_{k,m} be a binary indicator that takes value one if cluster m𝑚m of population k𝑘k is sampled.

E[\displaystyle E\big{[} |Rk,iWk,ipkqkμk|2|Rk,jWk,jpkqkμk||mk,i=mk,j=m]\displaystyle|R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}|^{2}|R_{k,j}W_{k,j}-p_{k}q_{k}\mu_{k}|\big{|}m_{k,i}=m_{k,j}=m\big{]}
=E[((1pkqkμk)2pkAk,m+(pkqkμk)2(1pkAk,m))\displaystyle=E\big{[}\big{(}(1-p_{k}q_{k}\mu_{k})^{2}p_{k}A_{k,m}+(p_{k}q_{k}\mu_{k})^{2}(1-p_{k}A_{k,m})\big{)}
×((1pkqkμk)pkAk,m+(pkqkμk)(1pkAk,m))|mk,i=mk,j=m,Qk,m=1]qk\displaystyle\hskip 56.9055pt\times\big{(}(1-p_{k}q_{k}\mu_{k})p_{k}A_{k,m}+(p_{k}q_{k}\mu_{k})(1-p_{k}A_{k,m})\big{)}\big{|}m_{k,i}=m_{k,j}=m,Q_{k,m}=1\big{]}q_{k}
+E[(pkqkμk)3|mk,i=mk,j=m,Qk,m=0](1qk)\displaystyle+E\big{[}\big{(}p_{k}q_{k}\mu_{k}\big{)}^{3}\big{|}m_{k,i}=m_{k,j}=m,Q_{k,m}=0\big{]}(1-q_{k})
cpk2qk.absent𝑐superscriptsubscript𝑝𝑘2subscript𝑞𝑘\displaystyle\leq cp_{k}^{2}q_{k}.
E[\displaystyle E\big{[} |Rk,iWk,ipkqkμk||Rk,jWk,jpkqkμk||Rk,tWk,tpkqkμk||mk,i=mk,j=mk,t=m]\displaystyle|R_{k,i}W_{k,i}-p_{k}q_{k}\mu_{k}||R_{k,j}W_{k,j}-p_{k}q_{k}\mu_{k}||R_{k,t}W_{k,t}-p_{k}q_{k}\mu_{k}|\big{|}m_{k,i}=m_{k,j}=m_{k,t}=m\big{]}
=E[((1pkqkμk)pkAk,m+(pkqkμk)(1pkAk,m))3|mk,i=mk,j=mk,t=m,Qk,m=1]qk\displaystyle=E\big{[}\big{(}(1-p_{k}q_{k}\mu_{k})p_{k}A_{k,m}+(p_{k}q_{k}\mu_{k})(1-p_{k}A_{k,m})\big{)}^{3}\big{|}m_{k,i}=m_{k,j}=m_{k,t}=m,Q_{k,m}=1\Big{]}q_{k}
+E[(pkqkμk)3|mk,i=mk,j=mk,t=m,Qk,m=1](1qk)\displaystyle+E\big{[}\big{(}p_{k}q_{k}\mu_{k}\big{)}^{3}\big{|}m_{k,i}=m_{k,j}=m_{k,t}=m,Q_{k,m}=1\Big{]}(1-q_{k})
cpk3qk.absent𝑐subscriptsuperscript𝑝3𝑘subscript𝑞𝑘\displaystyle\leq cp^{3}_{k}q_{k}.

 

Other useful intermediate calculations.

For the moments of treatment indicators, notice that E[(Wk,iμk)2Wk,i]=μk(1μk)2𝐸delimited-[]superscriptsubscript𝑊𝑘𝑖subscript𝜇𝑘2subscript𝑊𝑘𝑖subscript𝜇𝑘superscript1subscript𝜇𝑘2E[(W_{k,i}-\mu_{k})^{2}W_{k,i}]=\mu_{k}(1-\mu_{k})^{2}, and E[(Wk,iμk)2(1Wk,i)]=(1μk)μk2𝐸delimited-[]superscriptsubscript𝑊𝑘𝑖subscript𝜇𝑘21subscript𝑊𝑘𝑖1subscript𝜇𝑘superscriptsubscript𝜇𝑘2E[(W_{k,i}-\mu_{k})^{2}(1-W_{k,i})]=(1-\mu_{k})\mu_{k}^{2}. In addition,

E[Wk,iWk,j|mk,i=mk,j]𝐸delimited-[]conditionalsubscript𝑊𝑘𝑖subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle E[W_{k,i}W_{k,j}|m_{k,i}=m_{k,j}] =E[Ak,m2](for m{1,,mk})absent𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚2(for m{1,,mk})\displaystyle=E[A_{k,m}^{2}]\quad\mbox{(for $m\in\{1,\ldots,m_{k}\}$)}
=σk2+μk2.absentsuperscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘2\displaystyle=\sigma_{k}^{2}+\mu_{k}^{2}.

Similarly, E[(1Wk,i)(1Wk,j)|mk,i=mk,j]=σk2+(1μk)2𝐸delimited-[]conditional1subscript𝑊𝑘𝑖1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗superscriptsubscript𝜎𝑘2superscript1subscript𝜇𝑘2E[(1-W_{k,i})(1-W_{k,j})|m_{k,i}=m_{k,j}]=\sigma_{k}^{2}+(1-\mu_{k})^{2}. Therefore, E[(Wk,iμk)Wk,j|mk,i=mk,j]=σk2𝐸delimited-[]conditionalsubscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗subscriptsuperscript𝜎2𝑘E[(W_{k,i}-\mu_{k})W_{k,j}|m_{k,i}=m_{k,j}]=\sigma^{2}_{k} and E[(Wk,iμk)(1Wk,j)|mk,i=mk,j]=σk2𝐸delimited-[]conditionalsubscript𝑊𝑘𝑖subscript𝜇𝑘1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗subscriptsuperscript𝜎2𝑘E[(W_{k,i}-\mu_{k})(1-W_{k,j})|m_{k,i}=m_{k,j}]=-\sigma^{2}_{k}. In addition,

E[(Wk,iμk)(Wk,j\displaystyle E[(W_{k,i}-\mu_{k})(W_{k,j} μk)Wk,iWk,j|mk,i=mk,j]\displaystyle-\mu_{k})W_{k,i}W_{k,j}|m_{k,i}=m_{k,j}]
=E[Ak,m2](1μk)2(for m{1,,mk})absent𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝜇𝑘2(for m{1,,mk})\displaystyle=E[A^{2}_{k,m}](1-\mu_{k})^{2}\quad\mbox{(for $m\in\{1,\ldots,m_{k}\}$)}
=(σk2+μk2)(1μk)2.absentsuperscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘2superscript1subscript𝜇𝑘2\displaystyle=(\sigma_{k}^{2}+\mu_{k}^{2})(1-\mu_{k})^{2}.

Similarly,

E[(Wk,iμk)(Wk,j\displaystyle E[(W_{k,i}-\mu_{k})(W_{k,j} μk)(1Wk,i)(1Wk,j)|mk,i=mk,j]=(σk2+(1μk)2)μk2,\displaystyle-\mu_{k})(1-W_{k,i})(1-W_{k,j})|m_{k,i}=m_{k,j}]=(\sigma_{k}^{2}+(1-\mu_{k})^{2})\mu_{k}^{2},

and

E[(Wk,iμk)(Wk,jμk)Wk,i(1Wk,j)|mk,i=mk,j]=μk(1μk)(σk2μk(1μk)).𝐸delimited-[]conditionalsubscript𝑊𝑘𝑖subscript𝜇𝑘subscript𝑊𝑘𝑗subscript𝜇𝑘subscript𝑊𝑘𝑖1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗subscript𝜇𝑘1subscript𝜇𝑘subscriptsuperscript𝜎2𝑘subscript𝜇𝑘1subscript𝜇𝑘\displaystyle E[(W_{k,i}-\mu_{k})(W_{k,j}-\mu_{k})W_{k,i}(1-W_{k,j})|m_{k,i}=m_{k,j}]=\mu_{k}(1-\mu_{k})(\sigma^{2}_{k}-\mu_{k}(1-\mu_{k})).

var(Rk,iWk,i)=pkqkμk(1pkqkμk)varsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘1subscript𝑝𝑘subscript𝑞𝑘subscript𝜇𝑘\mbox{var}(R_{k,i}W_{k,i})=p_{k}q_{k}\mu_{k}(1-p_{k}q_{k}\mu_{k}), var(Rk,i(1Wk,i))=pkqk(1μk)(1pkqk(1μk))varsubscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘1subscript𝑝𝑘subscript𝑞𝑘1subscript𝜇𝑘\mbox{var}(R_{k,i}(1-W_{k,i}))=p_{k}q_{k}(1-\mu_{k})(1-p_{k}q_{k}(1-\mu_{k})). Moreover,

cov(Rk,iWk,i,Rk,i(1Wk,i))covsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖\displaystyle\mbox{cov}(R_{k,i}W_{k,i},R_{k,i}(1-W_{k,i})) =E[Rk,iWk,iRk,i(1Wk,i)]E[Rk,iWk,i]E[Rk,i(1Wk,i)]absent𝐸delimited-[]subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖𝐸delimited-[]subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖𝐸delimited-[]subscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖\displaystyle=E[R_{k,i}W_{k,i}R_{k,i}(1-W_{k,i})]-E[R_{k,i}W_{k,i}]E[R_{k,i}(1-W_{k,i})]
=pk2qk2μk(1μk).absentsuperscriptsubscript𝑝𝑘2superscriptsubscript𝑞𝑘2subscript𝜇𝑘1subscript𝜇𝑘\displaystyle=-p_{k}^{2}q_{k}^{2}\mu_{k}(1-\mu_{k}).

Recall that E[Wk,iWk,j|mk,i=mk,j]=σk2+μk2𝐸delimited-[]conditionalsubscript𝑊𝑘𝑖subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗superscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘2E[W_{k,i}W_{k,j}|m_{k,i}=m_{k,j}]=\sigma_{k}^{2}+\mu_{k}^{2}. Therefore, cov(Wk,i,Wk,j|mk,i=mk,j)=σk2covsubscript𝑊𝑘𝑖conditionalsubscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗superscriptsubscript𝜎𝑘2\mbox{cov}(W_{k,i},W_{k,j}|m_{k,i}=m_{k,j})=\sigma_{k}^{2}. Also,

E[Wk,i(1Wk,j)|mk,i=mk,j]=μk(1μk)σk2.𝐸delimited-[]conditionalsubscript𝑊𝑘𝑖1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘2E[W_{k,i}(1-W_{k,j})|m_{k,i}=m_{k,j}]=\mu_{k}(1-\mu_{k})-\sigma_{k}^{2}.
E[Rk,iWk,iRk,jWk,j|mk,i=mk,j]𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑅𝑘𝑗subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle E[R_{k,i}W_{k,i}R_{k,j}W_{k,j}|m_{k,i}=m_{k,j}] =E[Rk,iRk,j|mk,i=mk,j]E[Wk,iWk,j|mk,i=mk,j]absent𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑅𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝐸delimited-[]conditionalsubscript𝑊𝑘𝑖subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle=E[R_{k,i}R_{k,j}|m_{k,i}=m_{k,j}]E[W_{k,i}W_{k,j}|m_{k,i}=m_{k,j}]
=pk2qk(σk2+μk2).absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘2\displaystyle=p_{k}^{2}q_{k}(\sigma_{k}^{2}+\mu_{k}^{2}).

Similarly,

E[Rk,i(1Wk,i)Rk,j(1Wk,j)|mk,i=mk,j]=pk2qk(σk2+(1μk)2).𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖subscript𝑅𝑘𝑗1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗superscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscript1subscript𝜇𝑘2E[R_{k,i}(1-W_{k,i})R_{k,j}(1-W_{k,j})|m_{k,i}=m_{k,j}]=p_{k}^{2}q_{k}(\sigma_{k}^{2}+(1-\mu_{k})^{2}).

Therefore,

cov(Rk,iWk,i,Rk,jWk,j|mk,i=mk,j)covsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖conditionalsubscript𝑅𝑘𝑗subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle\mbox{cov}(R_{k,i}W_{k,i},R_{k,j}W_{k,j}|m_{k,i}=m_{k,j}) =pk2qk(σk2+μk2)pk2qk2μk2absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘2superscriptsubscript𝑝𝑘2superscriptsubscript𝑞𝑘2superscriptsubscript𝜇𝑘2\displaystyle=p_{k}^{2}q_{k}(\sigma_{k}^{2}+\mu_{k}^{2})-p_{k}^{2}q_{k}^{2}\mu_{k}^{2}
=pk2qk(σk2+μk2(1qk)),absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝜇𝑘21subscript𝑞𝑘\displaystyle=p_{k}^{2}q_{k}(\sigma_{k}^{2}+\mu_{k}^{2}(1-q_{k})),
and
cov(Rk,i(1Wk,i),Rk,j(1Wk,j)|mk,i=mk,j)covsubscript𝑅𝑘𝑖1subscript𝑊𝑘𝑖conditionalsubscript𝑅𝑘𝑗1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle\mbox{cov}(R_{k,i}(1-W_{k,i}),R_{k,j}(1-W_{k,j})|m_{k,i}=m_{k,j}) =pk2qk(σk2+(1μk)2)pk2qk2(1μk)2absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscript1subscript𝜇𝑘2subscriptsuperscript𝑝2𝑘subscriptsuperscript𝑞2𝑘superscript1subscript𝜇𝑘2\displaystyle=p_{k}^{2}q_{k}(\sigma_{k}^{2}+(1-\mu_{k})^{2})-p^{2}_{k}q^{2}_{k}(1-\mu_{k})^{2}
=pk2qk(σk2+(1μk)2(1qk)).absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘superscriptsubscript𝜎𝑘2superscript1subscript𝜇𝑘21subscript𝑞𝑘\displaystyle=p_{k}^{2}q_{k}(\sigma_{k}^{2}+(1-\mu_{k})^{2}(1-q_{k})).

In addition,

cov(Rk,iWk,i,Rk,j(1Wk,j)|mk,i=mk,j)covsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖conditionalsubscript𝑅𝑘𝑗1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle\mbox{cov}(R_{k,i}W_{k,i},R_{k,j}(1-W_{k,j})|m_{k,i}=m_{k,j}) =E[Rk,iWk,iRk,j(1Wk,j)|mk,i=mk,j]absent𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑅𝑘𝑗1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle=E[R_{k,i}W_{k,i}R_{k,j}(1-W_{k,j})|m_{k,i}=m_{k,j}]
E[Rk,iWk,i|mk,i=mk,j]E[Rk,j(1Wk,j)|mk,i=mk,j]𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝐸delimited-[]conditionalsubscript𝑅𝑘𝑗1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle-E[R_{k,i}W_{k,i}|m_{k,i}=m_{k,j}]E[R_{k,j}(1-W_{k,j})|m_{k,i}=m_{k,j}]
=E[Rk,iRk,j|mk,i=mk,j]E[Wk,i(1Wk,j)|mk,i=mk,j]absent𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑅𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝐸delimited-[]conditionalsubscript𝑊𝑘𝑖1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle=E[R_{k,i}R_{k,j}|m_{k,i}=m_{k,j}]E[W_{k,i}(1-W_{k,j})|m_{k,i}=m_{k,j}]
E[Rk,iWk,i|mk,i=mk,j]E[Rk,j(1Wk,j)|mk,i=mk,j]𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗𝐸delimited-[]conditionalsubscript𝑅𝑘𝑗1subscript𝑊𝑘𝑗subscript𝑚𝑘𝑖subscript𝑚𝑘𝑗\displaystyle-E[R_{k,i}W_{k,i}|m_{k,i}=m_{k,j}]E[R_{k,j}(1-W_{k,j})|m_{k,i}=m_{k,j}]
=pk2qk(μk(1μk)σk2)pk2qk2μk(1μk)absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝜇𝑘1subscript𝜇𝑘superscriptsubscript𝜎𝑘2superscriptsubscript𝑝𝑘2superscriptsubscript𝑞𝑘2subscript𝜇𝑘1subscript𝜇𝑘\displaystyle=p_{k}^{2}q_{k}(\mu_{k}(1-\mu_{k})-\sigma_{k}^{2})-p_{k}^{2}q_{k}^{2}\mu_{k}(1-\mu_{k})
=pk2qk(μk(1μk)(1qk)σk2).absentsuperscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝜇𝑘1subscript𝜇𝑘1subscript𝑞𝑘superscriptsubscript𝜎𝑘2\displaystyle=p_{k}^{2}q_{k}(\mu_{k}(1-\mu_{k})(1-q_{k})-\sigma_{k}^{2}).

A.7 Intermediate calculations for Section A.3

 

E[Rk,iWk,i(Wk,iAk,m)|Ak,m,Qk,m=1,mk,i=m]=pkAk,m(1Ak,m).𝐸delimited-[]formulae-sequenceconditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝐴𝑘𝑚subscript𝑄𝑘𝑚1subscript𝑚𝑘𝑖𝑚subscript𝑝𝑘subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚E[R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})|A_{k,m},Q_{k,m}=1,m_{k,i}=m]=p_{k}A_{k,m}(1-A_{k,m}).

This implies

E[Rk,iWk,i(Wk,iAk,m)|mk,i=m]=pkqkE[Ak,m(1Ak,m)].𝐸delimited-[]conditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑚𝑘𝑖𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚E[R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})|m_{k,i}=m]=p_{k}q_{k}E[A_{k,m}(1-A_{k,m})].

Therefore,

E[i=1nk1{mk,i=m}Rk,iWk,i(Wk,iAk,m)]=nk,mpkqkE[Ak,m(1Ak,m)].𝐸delimited-[]superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript𝑊𝑘𝑖subscript𝐴𝑘𝑚subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚E\Bigg{[}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-A_{k,m})\Bigg{]}=n_{k,m}p_{k}q_{k}E[A_{k,m}(1-A_{k,m})].

  For n1𝑛1n\geq 1,

E[i=1nk\displaystyle E\Bigg{[}\sum_{i=1}^{n_{k}} 1{mk,i=m}Rk,iWk,i( Wk,mAk,m)| Nk,m=n]\displaystyle 1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Big{|}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\Bigg{]}
=1nE[i=1nk1{mk,i=m}Rk,iWk,i(i=1nk1{mk,i=m}Rk,iWk,inAk,m)| Nk,m=n]absent1𝑛𝐸delimited-[]conditionalsuperscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖𝑛subscript𝐴𝑘𝑚subscript N𝑘𝑚𝑛\displaystyle=\frac{1}{n}E\Bigg{[}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}-nA_{k,m}\Bigg{)}\Big{|}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\Bigg{]}
=E[Ak,m(1Ak,m)].absent𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚\displaystyle=E[A_{k,m}(1-A_{k,m})].

Therefore,

E[i=1nk1{mk,i=m}Rk,iWk,i( Wk,mAk,m)]𝐸delimited-[]superscriptsubscript𝑖1subscript𝑛𝑘1subscript𝑚𝑘𝑖𝑚subscript𝑅𝑘𝑖subscript𝑊𝑘𝑖subscript W𝑘𝑚subscript𝐴𝑘𝑚\displaystyle E\Bigg{[}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})\Bigg{]} =E[Ak,m(1Ak,m)]Pr( Nk,m1)absent𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚Prsubscript N𝑘𝑚1\displaystyle=E[A_{k,m}(1-A_{k,m})]\Pr(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\geq 1)
=qkE[Ak,m(1Ak,m)](1(1pk)nk,m).absentsubscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚1superscript1subscript𝑝𝑘subscript𝑛𝑘𝑚\displaystyle=q_{k}E[A_{k,m}(1-A_{k,m})](1-(1-p_{k})^{n_{k,m}}).

 

For n1𝑛1n\geq 1

E[Rk,iWk,i( Wk,m\displaystyle E[R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m} Ak,m)2|mk,i=m, Nk,m=n,Rk,i=1]\displaystyle-A_{k,m})^{2}|m_{k,i}=m,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n,R_{k,i}=1]
E[( Wk,mAk,m)2|mk,i=m, Nk,m=n,Rk,i=1]absent𝐸delimited-[]formulae-sequenceconditionalsuperscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚2subscript𝑚𝑘𝑖𝑚formulae-sequencesubscript N𝑘𝑚𝑛subscript𝑅𝑘𝑖1\displaystyle\leq E[(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})^{2}|m_{k,i}=m,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n,R_{k,i}=1]
E[Ak,m(1Ak,m)]n.absent𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚𝑛\displaystyle\leq\frac{E[A_{k,m}(1-A_{k,m})]}{n}.

Because Pr(Rk,i=1| Nk,m=n,mk,i=m)=n/nk,mPrsubscript𝑅𝑘𝑖conditional1subscript N𝑘𝑚𝑛subscript𝑚𝑘𝑖𝑚𝑛subscript𝑛𝑘𝑚\Pr(R_{k,i}=1|\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n,m_{k,i}=m)=n/n_{k,m}, we obtain

E[Rk,iWk,i( Wk,mAk,m)2|mk,i=m, Nk,m=n]E[Ak,m(1Ak,m)]nk,m,𝐸delimited-[]formulae-sequenceconditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚2subscript𝑚𝑘𝑖𝑚subscript N𝑘𝑚𝑛𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑛𝑘𝑚E[R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})^{2}|m_{k,i}=m,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n]\leq\frac{E[A_{k,m}(1-A_{k,m})]}{n_{k,m}},

which implies

E[Rk,iWk,i( Wk,mAk,m)2|mk,i=m, Nk,m1]E[Ak,m(1Ak,m)]nk,m.𝐸delimited-[]formulae-sequenceconditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚2subscript𝑚𝑘𝑖𝑚subscript N𝑘𝑚1𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑛𝑘𝑚E[R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})^{2}|m_{k,i}=m,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\geq 1]\leq\frac{E[A_{k,m}(1-A_{k,m})]}{n_{k,m}}.

Therefore,

E[Rk,iWk,i\displaystyle E[R_{k,i}W_{k,i} ( Wk,mAk,m)2|mk,i=m]\displaystyle(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})^{2}|m_{k,i}=m]
=E[Rk,iWk,i( Wk,mAk,m)2|mk,i=m, Nk,m1]Pr( Nk,m1|mk,i=m)absent𝐸delimited-[]formulae-sequenceconditionalsubscript𝑅𝑘𝑖subscript𝑊𝑘𝑖superscriptsubscript W𝑘𝑚subscript𝐴𝑘𝑚2subscript𝑚𝑘𝑖𝑚subscript N𝑘𝑚1Prsubscript N𝑘𝑚conditional1subscript𝑚𝑘𝑖𝑚\displaystyle=E[R_{k,i}W_{k,i}(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m}-A_{k,m})^{2}|m_{k,i}=m,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\geq 1]\Pr(\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}\geq 1|m_{k,i}=m)
qkE[Ak,m(1Ak,m)]nk,m.absentsubscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚subscript𝑛𝑘𝑚\displaystyle\leq q_{k}\frac{E[A_{k,m}(1-A_{k,m})]}{n_{k,m}}.

 

Conditional on  Nk,m=nsubscript N𝑘𝑚𝑛\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n and Ak,msubscript𝐴𝑘𝑚A_{k,m}, the variable Nk,m,1subscript𝑁𝑘𝑚1N_{k,m,1} has a binomial distribution with parameters (n,Ak,m)𝑛subscript𝐴𝑘𝑚(n,A_{k,m}). Then, using the formulas for the moments of a binomial distribution, we find that for any integer n𝑛n, such that 1nnk,m1𝑛subscript𝑛𝑘𝑚1\leq n\leq n_{k,m},

E[(i=1nk\displaystyle E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}} 1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2|Ak,m=a, Nk,m=n]\displaystyle 1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}\Big{|}A_{k,m}=a,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\Bigg{]}
=E[(Nk,m,1Nk,m,12/n)2|Ak,m=a, Nk,m=n]absent𝐸delimited-[]formulae-sequenceconditionalsuperscriptsubscript𝑁𝑘𝑚1superscriptsubscript𝑁𝑘𝑚12𝑛2subscript𝐴𝑘𝑚𝑎subscript N𝑘𝑚𝑛\displaystyle=E[(N_{k,m,1}-N_{k,m,1}^{2}/n)^{2}|A_{k,m}=a,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n]
=n2a2(1a)2+na(1a)(16a+6a2)+r1(a)+r2(a)/n,absentsuperscript𝑛2superscript𝑎2superscript1𝑎2𝑛𝑎1𝑎16𝑎6superscript𝑎2subscript𝑟1𝑎subscript𝑟2𝑎𝑛\displaystyle=n^{2}a^{2}(1-a)^{2}+na(1-a)(1-6a+6a^{2})+r_{1}(a)+r_{2}(a)/n,

where |r1(a)|subscript𝑟1𝑎|r_{1}(a)| and |r2(a)|subscript𝑟2𝑎|r_{2}(a)| are uniformly bounded in a[0,1]𝑎01a\in[0,1]. Therefore,

E[(i=1nk\displaystyle E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}} 1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2| Nk,m=n]\displaystyle 1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}\Big{|}\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\Bigg{]}
=n2E[Ak,m2(1Ak,m)2]+nE[Ak,m(1Ak,m)(16Ak,m+6Ak,m2)]absentsuperscript𝑛2𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚2superscript1subscript𝐴𝑘𝑚2𝑛𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚16subscript𝐴𝑘𝑚6superscriptsubscript𝐴𝑘𝑚2\displaystyle=n^{2}E[A_{k,m}^{2}(1-A_{k,m})^{2}]+nE[A_{k,m}(1-A_{k,m})(1-6A_{k,m}+6A_{k,m}^{2})]
+E[r1(Ak,m)]+E[r2(Ak,m)]/n.𝐸delimited-[]subscript𝑟1subscript𝐴𝑘𝑚𝐸delimited-[]subscript𝑟2subscript𝐴𝑘𝑚𝑛\displaystyle+E[r_{1}(A_{k,m})]+E[r_{2}(A_{k,m})]/n.

It follows that

E[m=1mk\displaystyle E\Bigg{[}\sum_{m=1}^{m_{k}} (τk,mτk)2(i=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2]\displaystyle(\tau_{k,m}-\tau_{k})^{2}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}\Bigg{]}
=(m=1mk(τk,mτk)2(nk,m(nk,m1)pk2qk+nk,mpkqk))E[Ak,m2(1Ak,m)2]absentsuperscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2subscript𝑛𝑘𝑚subscript𝑛𝑘𝑚1superscriptsubscript𝑝𝑘2subscript𝑞𝑘subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]superscriptsubscript𝐴𝑘𝑚2superscript1subscript𝐴𝑘𝑚2\displaystyle=\Bigg{(}\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})^{2}(n_{k,m}(n_{k,m}-1)p_{k}^{2}q_{k}+n_{k,m}p_{k}q_{k})\Bigg{)}E[A_{k,m}^{2}(1-A_{k,m})^{2}]
+m=1mk(τk,mτk)2nk,mpkqkE[Ak,m(1Ak,m)(16Ak,m(1Ak,m))]+𝒪(mkqk).superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2subscript𝑛𝑘𝑚subscript𝑝𝑘subscript𝑞𝑘𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚16subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚𝒪subscript𝑚𝑘subscript𝑞𝑘\displaystyle+\sum_{m=1}^{m_{k}}(\tau_{k,m}-\tau_{k})^{2}n_{k,m}p_{k}q_{k}E[A_{k,m}(1-A_{k,m})(1-6A_{k,m}(1-A_{k,m}))]+\mathcal{O}(m_{k}q_{k}).

Therefore,

1nkpkqkE[m=1mk\displaystyle\frac{1}{n_{k}p_{k}q_{k}}E\Bigg{[}\sum_{m=1}^{m_{k}} (τk,mτk)2(i=1nk1{mk,i=m}Rk,iWk,i(Wk,i Wk,m))2]\displaystyle(\tau_{k,m}-\tau_{k})^{2}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i}(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{2}\Bigg{]}
\displaystyle\longrightarrow (E[Ak,m(1Ak,m)](5+pk)E[Ak,m2(1Ak,m)2])m=1mknk,mnk(τk,mτk)2𝐸delimited-[]subscript𝐴𝑘𝑚1subscript𝐴𝑘𝑚5subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscript𝑛𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle(E[A_{k,m}(1-A_{k,m})]-(5+p_{k})E[A^{2}_{k,m}(1-A_{k,m})^{2}])\sum_{m=1}^{m_{k}}\frac{n_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}
+pkE[Ak,m2(1Ak,m)2]m=1mknk,m2nk(τk,mτk)2.subscript𝑝𝑘𝐸delimited-[]subscriptsuperscript𝐴2𝑘𝑚superscript1subscript𝐴𝑘𝑚2superscriptsubscript𝑚1subscript𝑚𝑘subscriptsuperscript𝑛2𝑘𝑚subscript𝑛𝑘superscriptsubscript𝜏𝑘𝑚subscript𝜏𝑘2\displaystyle+p_{k}E[A^{2}_{k,m}(1-A_{k,m})^{2}]\sum_{m=1}^{m_{k}}\frac{n^{2}_{k,m}}{n_{k}}(\tau_{k,m}-\tau_{k})^{2}.

  Notice that,

E[(i=1nk1{mk,i=m}Rk,iWk,i\displaystyle E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i} (Wk,i Wk,m))4|Ak,m=a, Nk,m=n]\displaystyle(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{4}\Big{|}A_{k,m}=a,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n\Bigg{]}
=E[(Nk,m,1(1Nk,m,1/n))4|Ak,m=a, Nk,m=n]absent𝐸delimited-[]formulae-sequenceconditionalsuperscriptsubscript𝑁𝑘𝑚11subscript𝑁𝑘𝑚1𝑛4subscript𝐴𝑘𝑚𝑎subscript N𝑘𝑚𝑛\displaystyle=E[(N_{k,m,1}(1-N_{k,m,1}/n))^{4}|A_{k,m}=a,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n]
E[Nk,m,14|Ak,m=a, Nk,m=n]absent𝐸delimited-[]formulae-sequenceconditionalsubscriptsuperscript𝑁4𝑘𝑚1subscript𝐴𝑘𝑚𝑎subscript N𝑘𝑚𝑛\displaystyle\leq E[N^{4}_{k,m,1}|A_{k,m}=a,\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$N$\kern-0.18004pt}}}_{k,m}=n]
n4,absentsuperscript𝑛4\displaystyle\leq n^{4},

Therefore,

E[(i=1nk1{mk,i=m}Rk,iWk,i\displaystyle E\Bigg{[}\Bigg{(}\sum_{i=1}^{n_{k}}1\{m_{k,i}=m\}R_{k,i}W_{k,i} (Wk,i Wk,m))4]=nk,m4pk4qk(1+𝒪(1pkminmnk,m)),\displaystyle(W_{k,i}-\hbox{\vbox{\hrule height=0.5pt\kern 1.1625pt\hbox{\kern-0.18004pt$W$\kern-0.18004pt}}}_{k,m})\Bigg{)}^{4}\Bigg{]}=n_{k,m}^{4}p_{k}^{4}q_{k}\left(1+\mathcal{O}\left(\frac{1}{p_{k}\min_{m}n_{k,m}}\right)\right),

uniformly in m𝑚m.

 

Suppose Xk,m=(Zk,m,1+Zk,m,2)2subscript𝑋𝑘𝑚superscriptsubscript𝑍𝑘𝑚1subscript𝑍𝑘𝑚22X_{k,m}=(Z_{k,m,1}+Z_{k,m,2})^{2}. Let Xk,m,1=Zk,m,12subscript𝑋𝑘𝑚1superscriptsubscript𝑍𝑘𝑚12X_{k,m,1}=Z_{k,m,1}^{2} and Xk,m,2=Zk,m,22subscript𝑋𝑘𝑚2superscriptsubscript𝑍𝑘𝑚22X_{k,m,2}=Z_{k,m,2}^{2}. Now suppose,

m=1mkE[Xk,m,12]0,superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚120\sum_{m=1}^{m_{k}}E[X_{k,m,1}^{2}]\longrightarrow 0,

and

m=1mkE[Xk,m,22]0.superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚220\sum_{m=1}^{m_{k}}E[X_{k,m,2}^{2}]\longrightarrow 0.

Using the binomial theorem and Hölder’s inequality, we obtain

m=1mkE[Xk,m2]superscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚2\displaystyle\sum_{m=1}^{m_{k}}E[X_{k,m}^{2}] =m=1mkp=04cpE[Zk,m,1pZk,m,2(4p)]absentsuperscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑝04subscript𝑐𝑝𝐸delimited-[]superscriptsubscript𝑍𝑘𝑚1𝑝superscriptsubscript𝑍𝑘𝑚24𝑝\displaystyle=\sum_{m=1}^{m_{k}}\sum_{p=0}^{4}c_{p}E[Z_{k,m,1}^{p}Z_{k,m,2}^{(4-p)}]
cm=1mkp=04E[|Zk,m,1|p|Zk,m,2|(4p)]absent𝑐superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑝04𝐸delimited-[]superscriptsubscript𝑍𝑘𝑚1𝑝superscriptsubscript𝑍𝑘𝑚24𝑝\displaystyle\leq c\sum_{m=1}^{m_{k}}\sum_{p=0}^{4}E[|Z_{k,m,1}|^{p}|Z_{k,m,2}|^{(4-p)}]
cm=1mkp=04(E[Xk,m,12])p/4(E[Xk,m,22])(4p)/4absent𝑐superscriptsubscript𝑚1subscript𝑚𝑘superscriptsubscript𝑝04superscript𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚12𝑝4superscript𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚224𝑝4\displaystyle\leq c\sum_{m=1}^{m_{k}}\sum_{p=0}^{4}(E[X_{k,m,1}^{2}])^{p/4}(E[X_{k,m,2}^{2}])^{(4-p)/4}
cp=04(m=1mkE[Xk,m,12])p/4(m=1mkE[Xk,m,22])(4p)/40.absent𝑐superscriptsubscript𝑝04superscriptsuperscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚12𝑝4superscriptsuperscriptsubscript𝑚1subscript𝑚𝑘𝐸delimited-[]superscriptsubscript𝑋𝑘𝑚224𝑝40\displaystyle\leq c\sum_{p=0}^{4}\left(\sum_{m=1}^{m_{k}}E[X_{k,m,1}^{2}]\right)^{p/4}\left(\sum_{m=1}^{m_{k}}E[X_{k,m,2}^{2}]\right)^{(4-p)/4}\longrightarrow 0.